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FURTHER EVIDENCE ON RESPONSE SETS AND 

TEST DESIGN 


LEE J. CRONBACU 1 
IViverdu of Illinois 

When a person takes an objective test, he may bring to the 
test a number of test-taking habits which affect his score. 
Personal ways of responding to test items of a given form 
(c.g., the tendency to say “agree” when given the alternatives 
“agree 1 ’“uncertain"-“disagree”) are frequently a source of 
invalidity. In 194ft, the writer (4) assembled evidence demon¬ 
strating that these “response sets" are present in a wide variety 
of tests. Since that time, much new evidence has come to 
light, and it is now possible to examine more completely the 
nature of response sets. While much of the material to be 
reported is new, evidence has also been drawn from scattered 
publications which were overlooked in tile earlier review, Ma¬ 
terial on response sets is to he found in a great many sorts of 
studies, discussed under many names. Particular attention 
should be drawn to the early reports of Forge (15) and Good- 
fellow (6) on this topic. 

As our earlier report demonstrated, response sets have been 
identified in tests of ability, personality, attitude, and interest, 
and in rating scales. Among the most widely found sets are 
acquiescence (tendency to say “True,” “Yes,” “Agree,” etc.), 
evasiveness (tendency to say “Indifferent," “Uncertain,” 
etc.), and similar biases in favor of a particular response when 
certain fixed alternatives are offered. Other sets include the 
tendency to work for speed rather than accuracy, the tendency 
to guess when uncertain, the tendency to check many items 
in a checklist, etc. Response sets become most influential as 
items become difficult or ambiguous. Individual differences 
in response sets are consistent throughout a given test, as shown 

'This study ww assisted by funds from the Bureau of Research and Service, 
College of Education. 
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by split-half coefficients. Response sets dilute a test with fac¬ 
tors not Intended to form part of the test content, and so reduce 
its logical validity. These sets may also reduce the test's, em¬ 
pirical validity. Response sets tend to reduce the ran^t* of 
individual differences in score. 

The pattern of this discussion is as follows: First, many 
studies are cited which bolster the conclusion that rcsjwmse 
sets are widely found, and are particularly injRucnti.il when 
a test is difficult. These new sources confirm earlier findings 
and do not modify them. The significant new material in this 
section relates to two multiple-choice tests, and confirms the 
hypothesis that this form of test is nearly free from response 
sets. The second section of the report deals with the nature 
of response sets. Questions considered are: Can perfornuru t e 
be altered by special directions or training to avoid response 
biases? Are response sets consistent traits, so that ,1 person 
shows a similar set on different tests? Are response sets corre¬ 
lated with other aspects of personality? These studies deal 
particularly with the question whether rcsjwnse sets are due 
to a transient mind-set and are therefore only a nukame in 
testing, or whether they may provide data on important vari¬ 
ables, The third and final section reviews methods used to 
control the influence of response sets on validity, and thscuaM* 
what test constructors can do to design better tests. 

Evidence that Response Sets Exist 

It is scarcely necessary to marshal further evidence that 
reliable individual differences in response sets exist. Vet the 
widespread use of test forms which permit response sets indicates 
that their existence is not adequately appreciated. It is not 
only the old tests—Seashore, Bernreuter, Thu rs tone attitude, 
Strong—that suffer from response sets. New tests appear con¬ 
tinually, especially tests of attitude and personality, whose 
forms invite response sets, The writer has routinely requested 
graduate students to analyze their data for response sets when¬ 
ever their research employed tests with fixed response cate¬ 
gories (A-U-D, Yes-No-?, etc.). Never has such an analysis 
failed to disclose individual patterns of response, sMistkalh cm- 
sistent from item to item. 
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The most effective simple design to demonstrate response 
sets is to obtain a score for each person on die suspected 
response set. T hus, Lorge tested the existence of “gen-like,” 
or acquiescence on the Strong test, by counting how many 
items each person marked “L.” The split-half or Kuder- 
Richardson reliability of the response-set score can then be 
computed. Tabic i condenses the evidence obtained by this 
and other techniques, evidence which, together with that pre¬ 
viously assembled, shows conclusively that response sets are 
to be found in a great many tests. 

One study requires a separate report, because it is based on a 
factorial! y-dcsigncd test in which items are intended to be 
homogeneous. Kenneth Fells supplied the writer with tests 
“Cards'* and “Figures," from ThurUtone's Tests of Primary 
Mental JIMP?:,, which had been given to pupils in a Mid¬ 
western city as part of a study by the University of Chicago 
Committee on i nltural Factor*, in Intelligence Tests, under a 
grant from the General Hducation Hoard. Both of these tests 
present a geometric figure at the left of the row, and follow it 
with figures just like the given one save that they have been 
rotated through */’ * U'T, or 27')’, or are mirror-images of one 
of these rotation*,. Direc tions arc to "mark every card (figure) 
that is like the first card (figure)." It was observed that some 
pupils seem to search for all correct answers, whereas others are 
content to identify one or two seemingly correct answers, and 
then go on to the following row. Papers were drawn at random 
from those given to all pupils in two large junior high schools. 
Papers were discarded where any row had been omitted, or 
where the total score on Cards was high (4ft nr more our of P4 
possible). This avoids spuriously high apparent reliability for 
the rcsponsc-ser score. The test had been given with double 
time, and the test was in effect unspmlcd for the pupils studied. 
Two response-set scores were obtained for each pupil: Cards 
R -!• \V, and Figures R d \V. This score indicates a tendency to 
mark many items in a row. It implies thoroughness and per¬ 
sistence in marking, and jrerhaps acquiescence. The correlation 
of the two K + W scores is .54 (<Y ! 1 o<d. On the whole, those 
who mark fewer stents appear to Iwe poorer students, but no 
estimate of repome sets, independent of ability, could be ob¬ 
tained. For the selected cases, the correlation of R + W card* 
with R — W Figures was ,44, and that of R + W Figures 
with R — W Cards, was .;! p These data are interpreted as 
showing that in addition to the space factor (ability to discrim¬ 
inate similar forms), performance on this test is influenced by 
a response set. Many students arc found who mark few or no 
incorrect figures (R + W » R — VV) but who fail to mark all 
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the correct alternatives Since some of the reliable regime-set 
variance is uncorrelated with the space factor, the entrance of 
response sets reduces the factorial purity of the test. Certainly 
tests which aim at measurement or a single factor must he de¬ 
signed to eliminate response sets. 

Response Sets in Multiple-Choice 'Tests —The only major form 
of fixed-alternative test which has so far been found free from 
response sets is the multiple-choice item. In order to determine 
whether response sets can be extracted from a typical test 
of this type, the writer has studied the Henman- A than Test 
of Mental Ability , Form A , for Grades j~Sh The data for this 
study were supplied by Eells, from the study which provided 
the Thurstone data discussed above. Thousands of test papers 
were available, since every child in several grades in a mid- 
western city had been tested. The sample for this study was 
chosen indiscriminately, from papers of upper-lower and I.mvr- 
middle-class children. In administering the test experimental! v, 
Eells allowed an extended time of no minutes beyond rile 
standard time of 30 minutes. Papers not completed m cm in 
the extended time were discarded in the present analysis. 

The Elenmon-Nelson is a suitable test for mvcAtbj.itim! re¬ 
sponse sets because items were prepared with care, a a* fairly 
well arranged as to difficulty, and are designed so that rl/e 
correct answer appears about equally often in each of the 
five response-positions. The hypothesis is that some students 
may persistently tend to select choices early in the group of 
five. This would raise their scores on items where the correct 
answer ^is choice “1” or “a” but lower than on items keyed 
4 or 5 - The psychological basis lor the hypothesis is the 
possibility that some students read every alternative and dis¬ 
criminate carefully, where some merely read through the item 
to find a plausible answer, mark it, and go on to the next item. 

The procedure was the usual one; to obtain a ‘’bias'* score 
for each individual and determine its reliability. If the score 
is reliable, the response set is proved to exist. The response 
set score for the present hypothesis consists of “number of errors 
appearing to the left of the correct answer" minus “number of 
errors to the right of the correct answer." Before rescoring 
papers for bias, papers of high-scoring pupils (those having 
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a score above 60 out of 90 items correct) were discarded. This 
was done to increase the likelihood of finding a response set, 
since response sets have no opportunity to show themselves 
when the pupil gets most items correct. For a group of 66 
papers, bias scores ranged from 24 to —12. The person with 
the bias score 24 had made 39 errors to the left of the true 
answer, and only 15 errors to the right of the true answer. 
Such a preponderance is hard to explain as other than a habit 
of marking items. For the cases studied, however, the split- 
half reliability of the bias score was only .095, corrected. Such 
a low correlation indicates that the postulated response set 
is of no consequence for this group. A second sample of 84 
cases having raw scores of 40 or below in extended time (these 
pupils had IQ’s near or below 80) were studied separately, 
in order to increase the probability of finding a response set. 
For these pupils, the reliability of the bias score was .4a, 
corrected. Evidently for a group of pupils taking a difficult 
multiple-choice test, reliable response sets can be found. Bias 
has a slight relation to raw score; the mean raw score for these 
poor pupils was 24.5 for those with negative bias, and 29 for 
those with positive bias. For some reason, very poor students 
tended to mark alternatives to the right of the correct answer 
proportionately more often than slightly better pupils. 

An attempt was made to demonstrate such biases as "prefer¬ 
ence for position 1.” No statistical evidence for such sets could 
be obtained, although an occasional case does suggest that 
such biases may occur. One boy, for example, never in 90 
items marks the fifth choice as correct, and another student 
places 30 of his marks on position "1,” 

A second study was made with a modified version of the 
Ohio State University Psychological Examination, using data 
made available by N. L. Gage and Dora Damrin. The 
shortened test they used consists of 90 five-choice vocabulary 
items, unspeeded. This test was administered to unselected 
juniors and seniors in several high schools. When papers for all 
171 pupils were scored for tendency to place answers before 
rather than after the correct position, the odd-even reliability 
of the bias score was found to be ,ao. When only the lowest 65 stu¬ 
dents (as judged by the total number right on the test) were used 
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as a sample to determine the reliability of the bias store. the re¬ 
liability rose to .29. This was a group of students for whom the 
test was extremely difficult; the highest score for the group was 
12 right out of 90. It should be noted that this test is normally 
used for predicting college success among superior lisdl vrhnnl 
students; the highest score in this limited subdivjsionofour ‘.am¬ 
ple is only chance expectation, When an even more restricted 
sample was used—the lowest 26 cases, all of whom fell below a 
raw score of 15 items correct--the reliability of the bias %core 
rose to .54. The mean bias score changed as the quality of stu¬ 
dents became poorer. For the total group, the mean bias score 
was —6.5; for the second group, —7.7; and for the very lowest 
group, —9.7. Here, also, the poorest students apparently tended 
particularly often to mark errors to the right of the correct 
answer. 

Both of these studies demonstrate that rc»|Hin*c sets arc a 
minor factor, since so great a selection of can's w.»** required 
in order to demonstrate any evidence of bias. Probably other 
multiple-choice tests where all subjects mark ail items Hiller 
little from response sets. Confirming studies on other multiple- 
choice tests are desirable, but the generally satisfactory ex¬ 
perience with forced-choice tests should entourage their ton- 
tinued widespread use. 

Stability of Response Sets 

While there is ample evidence that response sets arc con¬ 
sistent throughout a single test, it is important to determine 
whether they are characteristics of the individual stable from 
time to time, or are transient sets which can only be regarded 
as errors in testing rather than personality characteristics. 

Some evidence that response sets are stable appears in scat¬ 
tered studies, Thorndike (22, p, 33) reports that on a speeded 
Air Force test, scores obtained at the same sitting correlate no 
more than scores obtained several hours apart. If a speed 
accuracy set is operating, it is not a set which shifts ft run hour 
to hour. Singer and Young (ai) found that a tendency to rate 
varied stimuli as "pleasant” was highly stable, correlations as 
high as .90 being found under certain conditions over time in¬ 
tervals of two weeks, 
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Whereas these and similar studies tend to stress the stability- 
in response sets, we ordinarily think of mental sets as easily 
changed by suitable directions. If the response set is viewed as 
a way of interpreting an ambiguous situation, as when the word 
“like” is left for the subject to define, any change in directions 
should re-define the stimulus elements and alter individual re¬ 
sponse sets. Several studies show that this can be done. 

Rubin (20) several years ago demonstrated the existence of 
bias in the Seashore Pitch Test, tie gave the Revised Test B 
twice to 24.5 college students, and found that the group as a 
whole used 13958 "H” responses and only 10542 “ 1 .” responses, 
in judging whether the second tone was higher or lower. Ac¬ 
cording to the key, there were actually an equal number of 
differences in each direction. A similar mean bias was found 
by Rubin in data of Farnsworth. 

In two ingenious studies Rubin then established that tem¬ 
porary sets are a major element in bias. First he gave a “guess¬ 
ing” test, in which subjects imagined a tossed coin, and wrote 
down the way they imagined it would fall. One group was given 
directions as follows: “Imagine a coin which has an H for High 
on one side, and an L for Low on the other side.” In the other 
group this was reversed: “Imagine a coin which has an L for 
Low on one side, and an II for High on the other side.” There 
was a significant preponderance of the first-mentioned response 
on the first guessed item (i.e., the former group tended to say 
the second group to say “//')- There was a significant 
preponderance of the second-named response on the third guess 
of the series. Rubin then applied the same reversal to the Sea¬ 
shore test directions. 272 students were told, “If the second 
tone is lower than the first tone, print L; if higher, print HP 
Only 56.8 per cent of the errors were lows marked “H," com¬ 
pared to 60.0 per cent when much the same group were given 
the original directions (but note that some bias remained). 

A miniature experiment performed by graduate students as 
a class exercise gives further indication that response sets are 
easily altered. Lynn Henderson and Father Williams adminis¬ 
tered the revised Seashore Pitch Record B to ten students, re¬ 
peating the Record to make a total of 100 items. At the next 
class meeting, each student’s scored paper was returned to him 
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for brief study. His attention was drawn specifically to rhe 
nature of bias by having him count whether he tended fn mark 
“H" more often than He was informed that in each group 
of ten items, just half were correctly answered ‘7/tjjA." The 
writer conducted the discussion, talking about bias for about 
fifteen minutes and suggesting strongly that bias could be elim¬ 
inated with effort and that pitch scores would be improved as a 
result. Papers were collected as soon as bias had hern examined, to 
reduce the possibility of learning specific answers. Students were 
never informed, and few suspected, that the same record was 
used for both items i to 50, and 51 to too. After the discussion, 
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the 50-item record was readministered, the papers collected, 
and the record readministered again, yielding a ico-item post* 
test. This is admittedly an inadequate experiment, especially 
in the absence of a control group to measure the effect of prac¬ 
tice and suggestion, separated from training regarding bias. 
The results are nevertheless striking (Table a). Bias was notable 
on Tests IA and IB, largely eliminated on I 1 A and 11 B. Total 
scores generally rose, especially on I 1 B. The amount of gain 
in score corresponds somewhat to the amount of initial bias, 
except for case 7, whose gain is presumably an effect of pr »c- 
tice or motivation. This finding is not statistically significant. 
This study, small as it is, seems to show that bias can be 
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eliminated by direct coaching which makes the subject aware 
of his own bias. If the Pitch Test measured pitch threshold 
alone, increased insight into habits of responding would not 
affect scores. The study docs not prove that training in bias 
raises pitch scores, but it strongly suggests that this is true, 
Wyatt (28) also reports training subjects to avoid bias as a 
means of improving discrimination. Surely, on the basis of these 
data, it can be recommended that Seashore test papers should 
be checked for bias, and that where the person shows a marked 
bias in either direction scores should he regarded as probably 
giving too low an estimate of the person’s ability to discriminate 
pitch. 

Another report that altering directions affects response sets 
is made by Goodfellow (6). He finds that in psychophysical 
judgments the predisposition to report a stimulus as absent 
was reversed when the directions were worded: “Remember 
that in approximately one-half of the trials the correct answer 
will be yes.” 

The resemblance between response sets inferred from statis¬ 
tical data and “learning sets” found experimentally by Harlow 
(9) should be pointed out. In studies of monkeys, and also of 
children, he established definite evidence of generalized learn¬ 
ing to solve problems. The monkey enters an ambiguous situa¬ 
tion, namely, a discrimination apparatus where the proper 
choice among two alternatives leads to a food reward. In this 
situation, a personal communication from Harlow informs us, 
the monkey demonstrates a preference for one or another of 
the choices offered fe.g., for the red object rather than the 
blue). This preference may serve to increase errors (if, for in¬ 
stance, the square object has been keyed as correct, regardless 
of color). If the monkey is put through one learning series 
after another, in which a different cue differentiates the right 
and wrong choices in each series, the monkey quickly learns to 
learn. His learning curve on later series is strikingly steep. 
"With each successive block of problems the frequencies of 
errors attributable to these factors [one of which is initial pref¬ 
erence or response set] are progressively decreased. ... The 
process might be conceived of as a learning of response tend¬ 
encies that counteract the error-producing factors.” 
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Harlow has therefore shown that response sets are present 
in the new, ambiguous situation, and that under his conditions 
they are extinguished. In contrast, the test-taking sets of adults 
appear not to be extinguished by usual experiences, even though 
they increase the probability of error The difference appears 
to be that in Harlow’s experiment there is an immediate frus¬ 
tration attached directly to the wrong (preference determined) 
response. In school tests the penalty is delayed, and is usually 
attached to the total test performance rather than to the 
specifically wrong responses. False approaches to problems, 
such as biases, can be eliminated; sound sets, such as reading 
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each item carefully, can be learned. But direct ami immediate 
teaching will be more effective than such incidental punish 
ments as low total scores. 

Generality of Response Sets 

To some degree, a person shows consistent response sets 
from situation to situation. Table 3 summarizes studies bearing 
on this question. When similar situations are presented, re.' 
sponse set scores are significantly correlated. But there is no 
evidence that response sets arc consistent over widely different 
situations, and Singer and Young’s evidence indicates that this 
is not true. But one does not measure response sets alone. Re¬ 
sponse sets show only when the response to a situation is in 
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some way unclear. Singer and Young point out that habits of 
using their rating scale aie operative only when "affective 
arousal is weak or absent.” Perhaps affective arousal is weak 
for one person on tones, for another on odors. This would re¬ 
duce the response-set correlations. 

Response sets might be mere incidental sources of error in 
measurement, or they might reflect deeper personality traits. 
Evidence from many sources now combines to show that re¬ 
sponse sets reflect “real” variables. 

Johnston (13) gave the Bernreuter Inventory and the Hunter 
/Jttitude Scale to two groups of teachers. These groups were 
chosen on the basis of ratings by their principals, so that one 
group consisted of "autocratic” teachers, and one consisted of 
teachers who were markedly "democratic” in classroom prac¬ 
tice. Johnston found that these groups differed significantly in 
response sets. On the Bernreuter, the autocratic group gave an 
average of 52.6 "Yes,” 62.3 "No,” and 10.8 “?” responses. The 
three totals for the democratic group were 55.9, 66.8, and 4,7 
respectively. There were 42 teachers in the former group, and 
43 in the latter. The difference in "tendency to use question 
marks” (evasion?) was significant (P < .01). There was a sim¬ 
ilar difference on the Hunter scale. The mean number of state¬ 
ments marked “Undecided” rather than"Agree” or "Disagree” 
was 15 in the autocratic group and 10 in the democratic group 
(P < ,01). 

Mersman (17), in a small study of vocational interests, com¬ 
pared the Bernreuter responses of college students planning to 
be lawyers, musicians, and engineers. There were seventy-five 
cases in each group. Upon analyzing the number of responses 
of each type in each group, he found the following means: 
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The differences between engineers and musicians are significant 
(1% level), 

Evidently groups differentiated on external criteria also differ 
in response sets. Where this is so, part of the response-set 
variance must represent some real variable. For example, use 
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of question marks may indicate anxiety and evasiveness of 
personality, rather than a transient set alone. Forge (15) finds 
that the tendency to say “Yes,” "No,” and "?” (estimated 
from several tests) correlates as follows with scores on the 
Flanagan-Bernreuter keys: 

Yes Xn ? 


Confidence.. .. . 2-7 ~ * J S “-0.7 

Sociability.00 .ay ~,26 


Possible significance of response sets for empirical prediction 
is suggested by a study which finds that tendency to respond 
“?” is correlated negatively with success in selling life insur¬ 
ance (14). While the relationship found was not statistically 
significant, the difference between the mean number of ques¬ 
tion marks in the good and poor groups (8.4 vs. 12.8, CR 1.57) 
is large enough to suggest further investigation along this line. 

Improvement oj Test Design 

The heterogeneous bits of evidence pieced together here and 
in our previous report have established several generalizations. 

1, Any objective test form in which the subject marks fixed 

response alternatives ("Yes”-'No,” "True-False," “e," 

etc.) permits the operation of individual differences in response 
sets. The influence of response sets in the multiple-choice rest 
is, however, of minor importance. 

2, Response sets have the greatest variance in tests which 
are difficult for the subjects tested, or where the subject is un¬ 
certain how to respond. 

3. Items having the same ostensible content actually measure 
more than one trait, if response sets operate in the test. This is 
true even for tests which, scored as a whole, are "factorially 
pure.” 

4. Slight alterations in directions, or training in test-taking, 
alter markedly the influence of response sets. But if the situa¬ 
tion is not re-structured by the tester, individual differences in 
response set remain somewhat stable when similar tests are 
given at different times, 

5. Response sets are to a small degree correlated with ex¬ 
ternal variables such as attitudes, interests, and personality. 
This shows that they are in part a reflection of “real" and 
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stable traits. To this degree, response-set variance may be 
valid variance in some investigations. 

6. Tests are usually constructed to measure a trait defined 
by the content of the test items. If the form of the items per¬ 
mits response sets, two persons having equal true scores on 
the content factor will often receive different scores on the test. 
Response sets therefore ordinarily dilute the test and lower its 
validity. 

Paragraphs (5) and (6) crystallize the paradox response sets 
present. Some of the response-set variance is potentially useful, 
some of it is an interference with measurement. The problem 
for the tester is to capitalize on the effect of response sets where 
they are helpful to validity, and to eliminate their influence 
where it is undesirable. It is therefore important to decide 
which view is to be taken in any given situation. The writer 
has attempted to formulate rationally the response-set problem 
in factorial terms The analysis has been unsuccessful, pri¬ 
marily because response sets do not obey the fundamental 
additive law of factor theory. One cannot define a person’s test 
score as a weighted addition of his content-factor and response- 
set-factor scores, since response sets have an influence on his 
performance on each item proportional to his doubtfulness. 
That is, the weight for the response-set factor in any item is 
not a constant for all persons, but is a function of each person’s 
score in the content factor. Since the problem is not at present 
formulated analytically in a way which clarifies our thinking, 
we are confined to a general description of the relations. 

Considering only biases such as acquiescence and evasive¬ 
ness, response-set variance may be conceived as containing the 
following elements, combined in some proportion: 

1. Chance variance; resulting from purely random excess of 
choice of one or another alternative. 

a. Internally consistent but momentary response tendencies; 
sets operating throughout one testing, but shifting on a 
retest at another time. 

3. Stable response tendencies; sets operating consistendy even 
when the same test is given at different times. 

Evidence of the existence of Type 3 variance has been con¬ 
sistently found whenever investigators have sough t it. Evidence 
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for Type i variance is lacking, but it may be postulated on the 
grounds that no observed trait is expected to be perfectly 
stable. And of course chance variance is always with us. 

Response-set variance of Type i is not important; it is 
simply anothei manifestation of error variance, and its influence 
can be reduced by lengthening the test. Variance of Type a is 
unquestionably harmful, unless one happens to be doing re¬ 
search on evanescent sets or moods or some other fluctuating 
variable (for example, a study of mood changes concomitant 
with fatigue). Type i variance cannot correlate with stable vari¬ 
ables, and therefore lowers the validity coefficient of the test. 
Moreover, Type i variance is present in many items and prob¬ 
ably increases the coefficient of equivalence (split-half or Kudcr- 
Richardson reliability) of the test. Therefore, even if the test 
given on a particular day were lengthened indefinitely, we 
could not raise its empirical validity to 1.00 because scores sire 
partly saturated with an invalid factor. Type 3 variance is 
potentially useful, but to understand its action wc must divide 
at between 

3a. Valid variance, the portion of 3 that correlates with the 
criterion the test is intended to predict, and 
3b. Invalid variance, the portion of 3 that does not correlate 
with the criterion. 

We may always expect a portion of Type 3b, since the response 
set could correlate perfectly with the criterion only if the 
criterion is itself a set or a personality trait causing the set. 

Variance of Type 3a does exist, since in some studies the 
response-set score did correlate with some external variable. 
Moreover, research in a good many fields is turning to per¬ 
sonality variables which may be close cousins to response sets. 
Guilford anticipates that the "carefulness” factor, which is a 
response-set, may prove to have validity as a component of ,1 
battery for aircrew selection. In studies of prejudice or liberal 
ism, an investigator may find evidence on negativism useful 
And this is possibly one source of bias toward "No” ami " Dis¬ 
agree in taking tests. Variance of Type 3b reduces validity, 
and limits the maximum possible validity the test can have 
even if trials on different days are combined. Variance of Type 
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3a may increase validity if it is added into the score in one 
way, or it may lower validity if it is added in differently. Thus 
the studies of true-false tests (5) show that students tend to 
say “True" when in doubt, and the duller students, who are 
in doubt most often, say “True” most often. This raises their 
score on true items, lowers it on false items. Hence the poten¬ 
tially valid portion of the response-set variance lowers the dis¬ 
criminating power and validity of true items, and enhances the 
validity of the false items. 

Finally, it should be noted that there is no possibility of 
separating the four types of response-set variance in data from 
a single test; they come entangled in a single performance, and 
we must therefore consider the effect of the response-set vari¬ 
ance as a unit. This total is made up of a random element 
(Type 1), a real but invalid element (Type 2, 3b), and a poten¬ 
tially valid element (3a) which may in practice raise or lower 
the validity of the test score. Of these three categories, only 
3a, the valid variance, is likely to be entirely absent, and the 
size of the correlations of response sets with external variables 
suggests that 3a is not likely to be the principal component of 
the variance. Therefore: 

a. The probable effect of response-set variance is harmful, 
since elements 2 and 3b are usually present, and these ele¬ 
ments reduce the extent to which the test is saturated with 
the content factor it is supposed to measure. 

b. Even if valid variance is present, its effect may he to lower 
validity of some items or of the total score. Hut under 
certain circumstances, it may be treated in such a way that 
it raises the validity coefficient. 

c. Only under exceptional circumstances, when a test is de¬ 
signed to study the very personality characteristics which 
are reflected in the response set, does the response set appear 
to be a potentially helpful source of variance. 

Because the operation of response sets upon score is com¬ 
plex, a detailed illustration seems worthwhile. A spelling test 
is planned, using the directions: “Some of these words are cor¬ 
rectly spelled and some incorrect. Mark every item, -I- if cor¬ 
rect, o jf incorrect” If the test is intended to indicate whether 
the student will identify errors in his own writing outside of 
school, this form of item has an appealing resemblance to the 
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criterion task. Now suppose we have 6 students. A, B, and C 
know 40 words out of 60, are doubtful on the remainder, D. 
E, and F know 30 words. (This oversimplification of “knowing” 
a word avoids difficulty in this explanation. A) and D have no 
response set. Of the 60 words, just half are wrongly spelled, 
and when A and D are doubtful, they mark just half of the 
unknown words o. B and E are a little undmritic.d in non- 
school writing; they fail to notice some errors. But in taking a 
school test, they suspect the teacher of planting errors where 
there are none, and so mark o 60 per cent of the time when 
they are doubtful. C and F are undercritical in all their writing, 
and in taking the test they are also willing to accept errors; 
they mark o only 30 per cent of the time when in doubt. The 
scores then may develop as follows: 


Bias (Proportion of 
+ responses to 

A 

B 

c 

i) 

E 

r 

0 responses).... 

50/jo 40/60 70/30 jo/jo 

40/60 70/1C 

Words known. 

Guesses correct by 
chance: 

40 

40 

40 

30 

30 

30 

guessed +. 

S 

4 

7 

7 § 

6 

io| 

guessed 0. 

5 

6 

3 

7 i 

V 

43 

Most probable score. 
Maximum possible 
correct guesses: 

5 ° 

So 

So 

45 

45 

45 

guessed +. 

10 

8 

10 

15 

li 

t$ 

guessed 0. 

Maximum possible 

10 

10 

6 

15 

*5 

9 

score.. 

Minimum possible 
correct guesses: 

60 

58 

56 

60 

57 

54 

guessed +. 

0 

0 

4 

0 

0 

6 

guessed 0. 

Minimum possible 

0 

a 

0 

0 

3 

© 

score. 

40 

42 

44 

30 

33 

36 


In this, as in other problems, the tendency is for bias to restrict 
the range of scores, not to alter the mean score. Where an un * 
biased person may, with lucky guesses, earn a very high score, 
the biased person has a much smaller probability of reaching 
the same total. Bias which reflects "true crkicalncss” operates 
in the score no differently from bias which is only a special set 
used in taking a test. If the items are divided so that 70 per 
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cent of the words are correctly spelled, C and F are given an 
advantage, even over A and D. If more than half the spellings 
are incorrect, B and E will tend to earn higher scores than 
those who know an equal number of words (and are equal on 
the criterion). 

In an unbiased test, where all alternatives have an equal 
weight in the total test, response sets do not add to the variance 
of scores, but have a damping effect, reducing the range of 
points people may earn from a combination of guessing and 
partial knowledge. If one alternative is present more than 
another, response sets form part of the variance of the test 
scores. 

Methods of eliminating response-set variance .—The writer con¬ 
cludes that as a genera! principle, the tester should consider 
response sets an enemy to validity. Even when seeking to 
measure a trait resembling a response set, one can have con¬ 
fidence in the meaningfulness of the score only after showing 
that variances 1, a, and 3b are small in proportion to 3a. 
Therefore, in most rests and certainly in those not intended to 
measure personality, we should keep response sets from affect¬ 
ing the test score by one of the following methods: designing 
test items which prevent response sets, altering directions to 
reduce response sets, or correcting for response sets, 

(a) ’■Test design.- Since response sets are a nuisance, test de¬ 
signers should avoid forms of items which response sets infest. 
This means that any form of measurement where the subject 
is allowed to define the situation for himself in any way is to 
be avoided. (We must make an exception for tests where his 
way of interpreting the test is treated as a significant variable. 
But even so, the above analysis suggests limits to the possible 
validity of tests like the Rorschach which capitalize on am¬ 
biguity.) 

Item forms using fixed response-categories are particularly 
open to criticism. The attitude-test pattern, where the subject 
marks a statement A, a, U, d, or D, according to his degree of 
agreement, is open to the following response sets: Acquiescence, 
or tendency to mark “A” and “a” more than “d” and “D”; 
evasiveness, tendency to mark “U”; and tendency to go to ex¬ 
tremes, to mark "A” and “D” more than “a” and “d”. Prob- 
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ably not all three of these sets will operate to a sij*niHcanr 
degree in any given test, but it is better to eliminate the sets 
at the outset than to spend effort later trying to measure the 
effect of the sets and root them out. Test designers generally 
have argued for retaining the five-point scale of judgment, nr 
the more indefinite seven-point, ten-point, or even continuous 
scales. Such scales are open to marked individual differences in 
definition of the reference positions, with the more complex 
scale offering more chance for personal interpretation. The usual 
argument for the more finely divided scale of judgment on each 
attitude item is that it is more reliable and that subjects prefer 
it. If the latter advantage is significant, the finer scale may be 
retained and scored dichotomously. The argument that the 
finer scale gives more reliability is not a sound one. since this 
is precisely what we would expect if all of the added reliable 
variance were response-set variance and had no relation l<t be¬ 
liefs about the attitude-object in question. There is no merit in 
enhancing test reliability unless validity is enhanced at least 
proportionately. It is an open question whether a finer scale 
of judgment gives either a more valid ranking of subjects ac¬ 
cording to belief, or (what we are beginning to recognize as 
even more important) scores more saturated with valid v ariance. 
With raters trained to interpret the scale uniformly, so that 
response-set variance is removed, the finer scale may be at! 
vantageous. 

The writer therefore renews his earlier recommendation that 
the following forms of item be avoided in tests where high 
validity is more important than speed-of-test construction; 
true-false, like-indifferent-dislike, same-different, yes i no, 
agree-uncettain-disagree, and mark all correct answers. What 
does this leave? Foremost, it leaves the forced-choice or Uest- 
answer test. Our attempt to find a response set in the multiple- 
choice test was almost completely unsuccessful. A set was ex¬ 
tracted, and that a set with little reliability, only when the 
test was applied to subjects for whom it was unreasonably 
difficult. Further studies of multiple-choice tests are still in 
order, but experience to date justifies the assumption that they 
are generally free from response sets. One confirmation of the 
argument that forced choices should be used comes from a 
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study by Owens (18). He found that substituting forced-choice 
for the‘'yes-no’’ response of the conventional neurotic inven¬ 
tory significantly reduced the number of false positives, i e., it 
increased empirical validity. The forced choice has long been 
used successfully in many fields. Tests of mental ability now 
use it almost to the exclusion of other forms. Spelling, arith¬ 
metic, and grammar tests can certainly he cast in “recognize 
the right (or wrong) choice’* form, rather than checklist forms 
and others open to response sets. Thurstone used it success¬ 
fully in his paired-comparison approach to attitudes, and the 
same approach has long been found satisfactory in psycho¬ 
physics. The Kuder interest test is well known, and Kuder has 
recently developed a new test of personality in the same forced- 
choice form. Paired comparisons may serve well in employee 
rating, and the Army has found the forced-choice valuable in 
obtaining officer ratings. Apparently forced-choice items can 
be used for nearly all purposes now served by the inadequate 
item forms. 

Another important consideration is test difficulty, regardless 
of item form. The influence of response sets rises with difficulty, 
and therefore measurement of differences between students 
who find the test difficult is particularly invalid. This is, first, 
a reason for not using a test on subjects for whom it is quite 
difficult. Second, however, it suggests basing measurement on 
scales of adaptable difficulty. Thus, with the Kuhlmann-Ander- 
son mental-test series, one selects the scales which have a 
difficulty appropriate for the subject, and if the first tests tried 
prove to be too difficult, the tester can move to an easier set 
of items to obtain more accurate measurement. Tests of this 
type, which arc common in psychophysics, would be hard to 
use in group measurement; but experimental trial of such test 
designs is worth considering. If the Seashore Pilch Pest, for ex¬ 
ample, were redesigned, one might have a preliminary section 
of twenty (?) items, ranging from very hard to very easy. This 
could be scored as soon as completed, and if the score were high, 
the subject would be given a difficult 50-item test (perhaps 
with all differences five cycles or two cycles). But a subject 
who performed near the chance level on the preliminary test 
would be given a final test of items with large differences (per- 
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haps 20 to 30 cycles). A set of several overlapping scales would 
be required, all standardized on the same group. Such a test 
could not test large groups inexpensively, but could be quite 
accurate in testing individuals. 

(b) Modification of directions. —If, in any test, wc expect a 
particular response set to arise, we can revise the directions to 
reduce the ambiguity of the situation. Another way of accom¬ 
plishing the same end is to give students general training in 
test-wiseness. For example, if they know that in most true- 
false tests about half the items are false, they will tent! to 
avoid excessive acquiescence. If they know that the correction 
formula is based on chance, they will know that the odds are 
in their favor when they respond to items where they are un¬ 
certain. 

It appears to the writer that, in most tests, subjects should 
he directed to answer all items, even though this tends to in¬ 
crease the random error variance. In many situations, this 
source of error is less damaging than the constant errors in¬ 
troduced by differences in tendency to guess, checking thresh¬ 
old, or diligence in searching for correct answers. Wcsman (25) 
reports partial evidence that grammar items, where the sub¬ 
ject marks each error he notices in given sentences, become 
more reliable when the subject is directed to mark every sen¬ 
tence-part “correct” or “incorrect,” rather than just checking 
the“incorrects” (but evidence on validity is lacking). 

Whisler (27) raised the question of response-habits in Thur- 
stone-type attitude scales. He found that some subjects marked 
six or more items in a 22-item scale, and for them the reliability 
(parallel-test) of the attitude score was .89, But for the sub¬ 
jects who marked five or fewer items that they agreed with, 
the reliability was .62, Whisler thought that the subjects who 
checked^ more items were more careful in using the scale, or 
that their attitudes were more integrated. Hancock (8) followed 
Whisler with an experimental alteration of directions. First, he 
directed subjects to mark all the statements they accepted, 
then the five with which they most agreed, and, finally, the three 
of that five which they most strongly accepted. The shift of 
directions produced some alteration in scores. Generally, the 
standard deviation (in scale value) of scores increased when 
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fewer items were counted. For those with attitudes favorable 
to an occupation, the more items they checked, the closer 
their score was to the indifference position. Unfortunately, 
there is not enough evidence in the Hancock report to give a 
basis for selecting any particular number of checks as prefer¬ 
able. If the number of items checked affects mean, sigma, and 
reliability, there can be little justification for permitting the 
number to vary. It appeals desirable to require every subject 
to mark a fixed number of alternatives, selecting the state¬ 
ments with which lie most agrees. Limited experience with this 
procedure suggests that the subject should check around one- 
fourth of the statements. 

(c) Correction for response sets .—When response sets are 
entering scores on a test, we may control or correct for the 
effect by special scoring keys. One widely used method is the 
control score. If a “response-set score" can he obtained, we 
may identify all cases with extreme response sets and drop such 
cases from the sample, admitting that measurement for them 
is invalid. The most familiar examples appear in the control 
scores of the Minnesota Muitiphasic. Many other tests also 
permit us to derive such scores as bias or acquiescence, or 
number of items marked. In some tests it may be acceptable 
to report two scores for every subject; all the essential data 
in the hypothetical spelling test discussed earlier could he re¬ 
ported in one score “number right” and a second “number 
marked as incorrect." But simultaneous consideration of pat¬ 
terns of scores is awkward. 

Humm has long used the No-Count as a control score on his 
Temperament Scale. A comment in the Supplemental Manual 
for that test is of interest: 

It was observed that subjects whose scores in the Scale were 
at variance with the results of case studies by psychiatrists, 
psychologists, and social workers were found more often among 
those with an ultra-high or an ultra-low proportion of no- 
responses, than was the case where no-responses were in the 
middle ranges. Individuals who answer the questions of the 
scale with a high number of no-responses tend, consciously or 
unconsciously, to obscure their real temperaments, On the other 
hand, individuals with a low number of responses may exag¬ 
gerate their temperamental characteristics. 
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Eliminating cases with extreme control-scores has the disad¬ 
vantage of throwing out numerous subjects, but it is vastly 
better than treating the subjects as if the scores were valid. 
Sometimes a simple solution is to readminister the test with 
more careful directions, as Bennett and others illustrate (i). 
But more complex correction procedures are possible. In this, 
Humm and his co-workers were also pioneers. 

Two procedures have been developed for cases where No- 
Counts are extreme. The first is the “profile score,** For an 
initial sample of x81 cases, Humm had a criterion score on 
each component the test claimed to measure The profile score 
is the best estimate of the criterion score from the uncorrected 
score and the No-Count. This procedure, regressing from an 
external criterion rather than merely partiailing out No-Count 
in terms of the zero-order r between No-Count and raw com¬ 
ponent score, allows for the very reasonable assumption that 
part of the No-Count variance represents significant dements 
in personality. 

The second correction, reserved only for cases where profile 
scores are inadequately revealing, yields the “regression score.” 
This “stated the standard deviationai distance of the given 
component score from the mode of scores in that component 
attained in scales showing the same No-Count, The regression 
score takes no account of validity. It does not, therefore, con¬ 
sider how well the Component Score measures the ‘true* com¬ 
ponent strength.” This, of course, partials out all the response- 
set portion of the score variance. 

Humm and Humm (iz) report that their procedures raise 
the validity of interpretations, for those papers where correc¬ 
tion is required. Similar methods could no doubt be applied to 
other tests, and in the K-correction of the Multiphasie, a sim¬ 
ilar treatment is illustrated. Such refined statistical improve¬ 
ments are worth making only when one intends to treat a test 
quite seriously. It would scarcely be worthwhile to build a cor¬ 
rection score for acquiescence into the Bcrnrcuter test, in view 
of the many other bases for doubting its validity. But where 
great statistical labor in the form of factor analysis has already 
entered such a test as Guilford’s series, application of a con¬ 
trol score for response sets may be worth serious considera¬ 
tion. 
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Correction for response sets is a problem in suppressor vari¬ 
ables (to, pp. 140-142). We wish to retain valid response-set 
variance (Type 3a), but we wish to remove from the score the 
variance of Type 3b and 2. If an independent estimate of the 
Type 3a variance, or o! the combined undesirable variance, 
could be obtained by a pure measure of the response set itself, 
this estimate might be used as a suppressor variable. 

Capitalizing; on response-set variance . If response sets are 
thought of as possibly contributing to validity, one may weight 
the response sets in a way that maximizes their contribution. 
Cook and Leeds (3) correlate each possible response on an 
attitude scale for teachers with a criterion, and assign positive 
or negative scoring weights accordingly. One item is as follows, 
where the numbers in parentheses are weights: 

1 2 .1 4 s 

It is some- Strongly Agree Un» Disagree Strongly 

times neccs- agree decided disagree 

sary to break (o) (4) (—1) (4) (—1) 

promises to 
children. 

The criterion used was a dependable estimate of the ability of 
teachers to establish rapport with children, which the scale 
was supposed to predict. It will he noted that the scoring 
weights are "illogical,” since there can be no stronger response 
to "It is sometimes necessary. . . .” than to disagree (response 
4), which amounts to saying “It is never necessary.” The 
weights for responses 4 and 5 reflect the difference in response 
set (not in logically considered opinion) between teachers in 
the superior and inferior criterion groups. The defense of the 
Cook-I.eeds procedure, and the comparable method used in 
Strong’s Interest Blank, is that it yields considerable validity. 
The limitation is that invalid variance (Types 2 and 3b) is 
weighted just like valid variance, A particular “good” teacher 
who has a set to respond very emphatically will be penalized 
by the weights. The majority of "good" teachers, who avoid 
extreme responses, will be reliably discriminated by the key. 
One difficulty with the sheer empiricism represented here is 
that the weights serve their practical purpose but give little in¬ 
sight into the nature of the variables tested. The only basis for 
extending or improving the test is trial-and-error, developing 
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many more items of all sorts and trying them to see how the 
■weights come out. 

Sometimes, instead of employing correction scores to refine 
the total test score, one may modify the original test scores. 
Thus Flanagan (23, p. 9) suggests scoring Rights and Wrongs 
separately, and using each score in the multiple-correlation 
when trying to predict a criterion. This procedure permits one 
to weight “carefulness" variance separately from “ability" vari¬ 
ance. Work with true-false tests suggests that scores Rights-ort- 
True-Items and Rights-on-False-Iterns will have different valid¬ 
ity and may be assigned different weights in the predictor 
score (5). Probably this nation could be extended further, in 
empirical prediction. 

Summary 

This paper summarizes extensive evidence demonstrating 
that such response sets as bias in favor of a particular alterna¬ 
tive, tendency to guess, working for speed rather than accuracy, 
and the like, operate in conventional objective tests,. Not only 
are such sets widespread, but they reduce the validity of test 
scores. The response set can be altered readily by alteration of 
the directions or by coaching. Some studies show that resjwwe 
sets are somewhat correlated from one test to another (but not 
if the tests differ greatly in content), and that they are corre¬ 
lated with important external variables. While res|w>nw-set 
variance may under certain circumstances enhance logical and 
empirical validity, it appears that its general effect is to reduce 
die saturation of the test and to limit its possible validity. 

The following recommendations for practice, most of which 
were previously suggested, are reinforced by the present find¬ 
ings: 

1. Response sets should be avoided with the occasional ex¬ 
ception of some tests measuring carefulness or other personality 
traits which are psychologically similar to response sets. 

2. The forced-choice, paired-comparison, or '*do-guess’* mul¬ 
tiple-choice test should be given preference over other forms of 
test item. 

3. When a form of item is used in which response sets are 
possible, 
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a) Directions should he worded so as to reduce ambiguity 
and to force every student to respond with the same 
set. 

b) The test should not be given to a group of students for 
whom it is quite difficult. 

c) A response-set score should be obtained, and used to 
identify subjects whose scores are probably invalid. 

4. Where response sets are present, attempts should be made 
to correct for or to capitalize on the response set by an appro¬ 
priate empirical procedure. 

In view of the overwhelming evidence that many common 
item forms invite response sets, and in view of the probability 
that these sets interfere with accurate measurement, it will 
rarely be wise to build new tests around item forms such as 
A-U-D, Yes-No-?, and "check all correct answers.” It is to be 
hoped that the tests forthcoming in the future will be designed 
to increase their saturation with the factors the test is seeking 
to measure. 
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The counseling process is generally rtm'mLtd today as a 
professional psychological function. Writer* on the subject 
agree that the counseling experience is a dynamic relation- 
ship between two people ■ an cvcrrehangimt K'Luionship to 
which many variables contribute. This concept Ins** emerged 
as a result of three somewhat varied, yet related, types of re¬ 
search on the counseling process: studies of evaluation, studies 
of counseling methodology, and studies of factors operative 
within the counseling interview. The present rc'.r.mh study is 
classified in the last group in that it is a consideration of fac¬ 
tors at work within the interview situation. 

Research has shown that certain students neem to benefit 
from the counseling process. On the other hand, other students 
apparently do not benefit from this experience. Home students 
appear to follow counselor suggestions and to accept informa¬ 
tion more readily than do others, lire question arises: Within 
which interview situations do clients tend to accept informa¬ 
tion presented, and within which do they tend not to accept 
the data? Further, in what ways, if any, do students who 
accept the information differ from those who do not? Also, 
what types of information tend to be accepted, and what types 
tend not to be? 

In the present study "acceptance" is defined as favorable 
reception by the client of information presented to him, as 
demonstrated by (a) what the client says and (b) what the 
cient does.' Information" includes all data presented by the 
counselor, whether they be in the form of advice, suggestion, 
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emphasis, recommendation, interpretation, request or explana¬ 
tion. The type of interview in the present study is limited to 
educational-vocational planning interviews. 

Methodology of Study 

Utilizing one trained, experienced counselor, complete pho¬ 
nographic recordings were made of forty educational-vocational 
planning interviews. The clients used were University of Min¬ 
nesota first-quarter General College freshman men who volun¬ 
tarily sought counseling. They were typical of the General 
College population with regard to academic ability and voca¬ 
tional interests. 

Just prior to the actual recording, each of the clients com¬ 
pleted an “immediate pre-interview” form of inquiry pertain¬ 
ing to his educational and vocational plans. Within several 
days following the recorded interview, an interview during which 
the client’s academic ability, interest, and aptitudes had been 
discussed with him, with suggestions and recommendations 
made by the counselor, each client completed an "immediate 
post-interview” form of inquiry, In addition, the counselor 
after each interview indicated on a check list his judgment 
with regard to the emotional states of the client and counselor 
and the degree of rapport achieved in the interview. 

At one month and again at four months after the recorded 
interview, the investigator interviewed each of the clients in 
an effort to gain additional evidence for and against acceptance 
on the part of each client. Following this, all of the pre-inter¬ 
view and post-interview data were summarized for each case 
and presented to a team of three judges who, working inde¬ 
pendently, decided in which cases "acceptance” had occurred 
or had not occurred. A composite of the judges’ decisions was 
made in order to categorize the cases as acceptance or non- 
acceptance. 

In the meantime written transcriptions had been made of 
each of the forty recorded interviews. Following this, each of 
the client and counselor responses, numbering 12,238, were 
categorized into one of twenty-two categories. 

For the classification of counselor responses, Seeman’s nine 
categories were used (4). These are: (a) counselor questions 
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dealing with content or factual data; d*< counselor questions 
concerned with the attitudes and motivations of the client; 
(c) counselor responses to content; (di counselor responses to 
feelings, attitudes and motivations of the \ Hem. M counselor 
interpretations and opinions concerning content; (f* lounsclor 
interpretations and opinions concerning client attitudes, feel¬ 
ings and motivations; (g) suggestions, advice and counselor 
decisions on courses of action for the client; tin suggestions, 
advice and counselor decisions concerning client attitudes and 
feelings; (i) information given by the counselor. 

In addition to Seeman’s nine categories, two additional coun¬ 
selor-response categories were used in the present study. These 
were: (a) unclassified and (b) simple agreement: (" Yes." *‘uh- 
huh”). 

In the categorization of client responses, Snyder's eight gen¬ 
eral categories for client content and three general categories 
for client-feeling responses were employed in the -study (5), 
The client-content categories are: (a) problem; (h) asking for 
information; (c) disagreements; (d) answering questions; (e) 
agreement; (f) insight; (g) planning; and (It) miscellaneous. 

The client-feeling categories include: (a) positive attitudes 
(statements which reveal approval and acceptance of the client 
himself, the counselor or the counseling process or other per¬ 
sons, objects or situations); (b) negative attitudes (statements 
which reveal disapproval or rejection of the client himself, 
the counselor or the counseling process, or other persons, ob¬ 
jects or situations); (c) ambivalent attitudes. 

Pertinent personal data such as previous work experience, 
education and home background, as well as academic-aptitude 
test scores and interest and personality inventory results, were 
also gathered for each of the forty cases. 

Findings of the Study 

. composite rating of the judges showed that in twenty- 
six of the forty cases the clients either “definitely'' or “for the 
most part” accepted information presented in the interview. 
The other fourteen cases were divided among the "indecisive" 
cases and the “definitely" and "for the most part" non-accept¬ 
ance cases. The heavy weighting of acceptance cases may be 
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attributed in part to the fact that the clients came voluntarily 
for help with their problems. 

The most important findings of this investigation pertinent 
to the dynamics of acceptance are the following: 

1. Client acceptance of information presented occurs most 
often in those situations in which both client and counselor 
are completely relaxed. When either of the two, or both, 
are not relaxed, acceptance is less likely to occur. 

2. Acceptance is directly related to “positive attitude" as ex¬ 
pressed by clients during the interview. Acceptance, on the 
other hand, is inversely related to both negative and 
ambivalent attitudes, as expressed by clients during the 
interview. 

3. Acceptance is directly related to a “readiness” for coun¬ 
seling help, Merely having a “felt need” on the part of 
the client does not necessarily mean that acceptance of 
information, pertaining to that need, will occur. A readi¬ 
ness to act with regard to a felt need appears to be the cru¬ 
cial factor with regard to acceptance 

4. Information which is directly related to the client’s own 
immediate problem tends to be accepted. 

5. Information which is not itt opposition to client self-concept 
tends to be accepted. Further, information which shows 
the client to be like others of his group tends to be ac¬ 
cepted whereas information which shows him to be devi¬ 
ate tends not to be accepted. 

Less crucial findings of the study are: 

1. The counselor used in the present study did not differ 
significantly in his counseling approach for the accept¬ 
ance and the non-acceptance groups. It was found that, 
as the interview progressed, he (a) asked fewer ques¬ 
tions, (b) gave more suggestions and directions, and (c) 
showed less simple agreement. He showed an increase 
in information-giving from the initial one-third of the 
interview to the middle one-third and then a decrease 
from the middle to the final one-third. He made little 
use of feeling-responses. 

2. Although the counselor’s approach in the interview situ¬ 
ation does not vary significantly in the present study, 



36 EDUCATIONAL AND PSYCHO LOGICAL MEASUREMENT 


some clients accept information presented, whereas 
others do not. This suggests the operation of factors 
other than the counselor's approach in the determination 
of client acceptance. 

3. Both acceptance and non-acceptance of information can 
occur in situations in which the client-counselor rela¬ 
tionship is friendly. Also, when an apathetic relationship 
is experienced, either acceptance or non-acceptance can 
occur. 

4. There appears to be a positive relationship between non- 
acceptance and the achievement of only a “surface” 
understanding of the problem by the client and the 
counselor, as indicated by the counselor's rating of the 
interview. 

5. Acceptance does not appear to be related to client use of 
such categorized responses during the interview «s (a) 
statement of the problem, (b) answering of counselor 
questions, (c) indications of insight gained, (t|) indica¬ 
tions of plans, and, (e) unrelated client discussion. The 
data suggest (although the findings between acceptance 
and non-acceptance groups are not statistically signifi¬ 
cant) that acceptance may be related to client agree¬ 
ment and inversely related to client disagreement, as 
shown by client responses during the interview. Accep¬ 
tance may also be inversely related to the asking of 
factual questions during the interview. 

6. For both the acceptance and the non-acceptance cases 
as the interviews progressed, there was (a) a decrease* in 
client statement of the problem, (b) an increase, fol- 
lowed by a tapering off, in the asking of questions by 
the client, (c) a decrease in client answering of enun- 
se or questions, (d) an increase in client agreement, and 
(f) an increase in client statements pertaining to plans. 
The non-acceptance group showed an increase in un¬ 
related statements, whereas the acceptance group was 
constant with regard to unrelated data, 

7. The non-acceptance cases, like the acceptance cases, 
showed an increase in the expression of positive feelings 
as e interview progressed. The level of expression of 
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positive feelings, however, was significantly lower 
throughout the interview for the non-acceptance group. 
The two groups likewise showed parallel patterns of 
decrease of negativism and ambivalence. 

8. Acceptance appears to be unrelated to the factors of 
(a) the length of the interview, (b) the time of the day 
of the interview, and (c) the proportion of time which 
the client speaks during the course of the interview, 

9. Acceptance is unrelated to (a) academic aptitude, (b) par¬ 
ticular measured personality patterns, (c) social status 
of the client’s home, (d) veteran status, (e) marital 
status, (f) part-time work status while in college or (g) 
the factor of previous client-counseling contacts. 

10. For those judged to have "definitely” accepted informa¬ 
tion presented, there appears to be a direct relationship 
between acceptance and good first-quarter academic 
achievement. 

11. With regard to vocational interest patterns, acceptance 
appears to be related to the presence of interest profiles 
which contain all three types of interest patterns: pri¬ 
mary, secondary, and tertiary. Except for this finding, 
acceptance is unrelated to any particular vocational in¬ 
terest pattern. 

ia. Different kinds of information are accepted equally well 
by the acceptance and non-acceptance groups, with one 
possible exception. Information which involves an alter¬ 
ing of previously made client plans tends to be more 
often accepted by the group defined as the "acceptance” 
group. 

Conclusions and Implications for Counseling 

Conclusions obtained from the present findings and implica¬ 
tions for counseling follow: 

1. The importance of certain psychological factors in the 
acceptance of information has been noted. The most conclusive 
of all the findings, perhaps, is that acceptance is related to client 
feeling, particularly feeling or attitude toward self. The im¬ 
portance of an emotionally relaxed client-counselor relation¬ 
ship has been shown. The factor of "readiness” has been indi- 
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cated. Further, it has been pointed out that information which 
is directly related to the client’s own immediate needs is likely 
to be accepted, as is information which docs not oppose or in- 
jure the client self-concept. 

The counselor must recognize the presence of positive, nega¬ 
tive, and ambivalent attitudes of the client. If the client shows 
a predominance of negativism and/or ambivalence, it may be 
necessary that the counselor structure the counseling process 
in such a manner that there would be a scries of‘"preparation 
for educational-vocational planning” contacts, devoted to the 
development of proper client sets and attitudes. Once this is 
done, acceptance of information pertaining to educational-vo¬ 
cational planning might take place more readily. 

On the other hand, if the client demonstrates a warmth 
toward the interview, toward the counselor, toward himself, 
as well as toward others, the planning interview ran proceed 
and the counselor may feel reasonably certain, other factors 
being equal, that acceptance of information will occur. 

The finding that there is an increase in positive expression 
and decreases in negativism and ambivalence for the non- 
acceptance cases, as the interview progresses, poses an in¬ 
teresting problem. In the first place, tins finding should be 
indicative to the counselor that all clients who “warm up” 
during the interview will not necessarily accept. More impor¬ 
tant, however, this demonstrated rise in positive feelings may 
be interpreted as an encouraging sign—a sign that may he 
indicative of acceptance at some later time, providing the 
client is given proper orientation and preparation for the edu¬ 
cational-vocational planning session. 

On the other hand this warming-up may be merely an ex¬ 
pression of a pleasant social convention. In our culture all of 
us are taught to be as agreeable as possible, to put our Best 
social face forward, Hathaway (3) has called this the "hello- 
goodbye” convention, this tendency to be pleasant and to 
express formal gratitude at the end of the interview. He. warns 
against utilizing such expressions of goodwill in the interview 
as measures of the effectiveness of the interview. Hence this 
rise in positive feelings may be only a measure of the social 
graciousness of the client. 
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Closely allied is the factor of “readiness.” Negativism and 
ambivalence may actually be indicative of a lack of readiness 
in some cases. The counselor must determine whether or not 
the client is ready for educational-vocational planning. If he 
is not ready, perhaps there will need to be “preparation for 
planning” contacts, as previously mentioned. In this connec¬ 
tion one should not forget Butler’s differentiation between the 
adjustment and distributive phases of counseling (2). He con¬ 
tends that the distributive phase (Kefauver’s term, the use of 
“planning phase” might be even more appropriate) should 
not be entered upon until adjustment to the present and to 
himself has been assured. Thus the “preparation for planning” 
spoken of here may mean adjustment counseling using permis¬ 
sive methods of treatment. The “readiness to act” mentioned 
earlier may merely mean a lack of preoccupation with areas 
of self-regard other than those associated with the planning 
at hand. 

The establishment of an emotionally relaxed relationship 
between client and counselor is, apparently, necessary before 
information will be accepted, The importance of good rapport 
in the interview relationship has long been recognized. The 
importance of an emotionally relaxed state as a contributor 
to good rapport is specifically noted here. The counselor is 
obligated to establish, insofar as possible, such a relationship. 

The counselor must recognize what needs are most immedi¬ 
ate and most pertinent to the client. 'That which the client recog¬ 
nizes as real, not what the counselor sees, is most important. Ac¬ 
cordingly, if acceptance is to occur, it is necessary to start at 
the level where the client operates at the moment. The coun¬ 
selor must use techniques designed to assist in the development 
of the client to a more realistic awareness of himself. Here is 
introduced again the factor of readiness or of need for self¬ 
adjustment, showing how the factors related to acceptance are 
not discrete but intertwined, 

The importance of starting at the level of thinking of the 
client is further given support by the present finding that 
information not in line with previous client plans tends to be 
rejected. The counselor must be aware of these previous plans, 
goals and objectives of the client. It is necessary for him to 
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recognize them and to take them into consideration if accept, 
ance is to occur. 

2. The lack of relationship between acceptance ami the con¬ 
tent of client response during the interview serves in a sense 
to point up and to give emphasis to the importance of client 
feeling. It appears that the counselor will do well to pay less 
attention to the content of what the client says and more to 
how the client feels. 

3. With regard to acceptance, such traditionally stressed 
factors as academic aptitude, personality patterns, vocational 
interest patterns, home background and previous counseling 
contacts show little or no relationship to acceptance. These 
data, useful as they are in some situations, do not seem to be 
crucial insofar as acceptance is concerned. The acceptance of 
information presented apparently can occur in spite of low 
academic ability or a poor home background, likewise, such 
factors as the length of the interview, the time of day of the 
interview, and the proportion of time in which the client speaks 
during the interview may not deserve the attention they arc 
sometimes given, at least as far as acceptance is concerned. 

4. The client with his needs, his wants and desires, his atti¬ 
tudes and feelings is the basic determiner of whether or not 
acceptance occurs. The data suggest that the client himself is 
more important than the interview situation itself or the type 
of information presented. 

5. Certain individuals may benefit little, if any, from a par¬ 
ticular counseling contact. In the present study with a group 
of college freshmen (typical of General College freshmen in 
general), there appears to be little reason for admitting that 
many of these students would never accept information pre¬ 
sented during an educational-vocational planning interview. 
The evidence seems to indicate that all of the clients in the 
present group, with further attention, might develop to a state 
of acceptance. The possibility needs to be explored that there 
would be a greater acceptance of test information if rest selec¬ 
tion were made by the clients in the manner suggested by 
Bordin and Bixler (1). Theoretically such client-chosen tests 
would be in personality areas where there is adequate "readi¬ 
ness” for acceptance of results. Any attempt to assist the client 
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toward a realistic self-acceptance must always start at the level 
of the client and must be developed at a pace agreeable to the 
client. Such a technique assumes that the counselor has suffi¬ 
cient insight to recognize underlying client problems and to 
see them in the total picture of the client’s existence. 

A Proposal For Further Research 

The present study might be regarded as a “pilot study,” 
inasmuch as it was limited in scope and was a pioneering ven¬ 
ture with regard to the methodology o1 the problem. For prac¬ 
tical considerations the size of the sample was limited. In 
order to permit generalization from the small sample, 
the sample was restricted to one stratum of the college popula¬ 
tion, thereby securing a more homogeneous group. To limit the 
variables operative within the interview situation, only one 
counselor was used. These limitations were deemed necessary 
for the present study. A similar study should be carried out in 
which the following conditions might be observed. 

1. A larger sample, representative of the total college popula¬ 
tion, should be utilized. 

2 . Several trained counselors should be employed to do the 
counseling. The counselors used should be known to vary 
from the more "non-directive” to the "directive” approach 
with regard to counseling philosophy and methodology. 
They might include those avowedly “eclectic” or those 
psychoanalytic in orientation. 

3. Recorded data should include the entire series of contacts 
with each case rather than only one interview. 

4. Problems not only pertaining to educational-vocational 
planning needs, but to other problem areas as well, should 
be studied. 

5. Careful investigation of pre-interview behavior and rigorous 
observation of post-interview behavior should be done, 

If the above suggestions were followed, a pool of data would 
be available which would provide many answers concerning 
the dynamics of the counseling process. Such a pool would 
provide data for any degree of intensity or extensiveness that 
would be desired. Acceptance of data pertaining to emotional 
and personality problems might be investigated. Detailed an¬ 
alysis of client sets which are brought to the interview could 
be made. Likewise, such other psychological aspects of the 
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interview as client motivation and the intcrai ti»>n of two per¬ 
sonalities participating in the interviews could hr studied. Stud¬ 
ies of counselor methodology and careful analysis of Moments 
of the interview could be made. The agency or institution 
which is willing to provide sufficient hacking for mrh an enter¬ 
prise will step into a position of leadership in counseling re¬ 
search. 
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THE CONCEPTS OF RELIABILITY 
AND HOMOGENEITY 


C. II. COOMBS 1 
University of Michigan 

I. Introduction 

The literature of test theory is replete with articles on the 
computation and interpretation of indices of reliability. In 
them one finds surprisingly little common agreement or even 
mutual understanding (6). In more recent years the concept of 
homogeneity, with its indices, has been added, with the result 
that the confusion has increased. We shall make no effort in 
this paper to review and summarize this literature but shall 
attempt to do three things: 

(1) point out what we regard as the fundamental sources of 
this confusion; 

(2) provide a theoretical foundation on the basis of which 
this confusion might be resolved; 

(3) point out the further steps that must be taken to develop 
the theory and practice of mental testing. 

II. Sources oj Present Confusion 

There are two fundamental sources 5 of confusion in present 
test theory: one is the assumptions by means of which we arrive 
at an interval scale (3), and the second is the identification of 


‘This paper is an extension to the area of mental testing of some of the ideas con¬ 
tained in a chapter in a general theory of psychological scaling developed in 1948-1949 
under the auspices of the Rand Corporation and while in residence in the Department 
and the Laboratory of Social Relations, Harvard University. While the author carries 
the responsibility for the ideas contained herein, their development would not have 
been possible without the criticism and stimulation of Samuel A. Stouffer, C, Frederick 
Modeller, Paul Laznrsfeld, and Benjamin W, White in n joint seminar during that 
year. Development of the theory before and after the sojourn at Harvard was made 
possible by tne support of the Bureau of Psychological Services, Institute for Human 
Adjustment. Horace H. Rackham School of Graduate Studies, University of Michigan, 
A version of these ideas was presented in a 1949 APA symposium on Test Homogeneity 
and Test Validity. ... , 

* A Complete discussion of the fundamental difficulties in present test theory is to 
be found in Thomas (j). 
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our statistical indices with the concepts they are presumed to 
measure. These two basic difficulties are intimately related and 
are both associated with our attempt to model psychological 
measurement on physical measurement. Let us discuss them 
briefly, in turn. 

Consider the manner in which data are obtained in the area 
of mental testing: The method used is the method of single 
stimuli, in which there is one response from each individual to 
each stimulus. These responses comprise our basic data, and 
consist of two piles of items for each individual. One pile has 
the items which the individual passed and the other pile those 
items which he failed. Note that there is no information in 
the data for a given individual pertaining to (j) hrnv well he 
passed one item compared with another, nr (2) how badly he 
failed one item compared with another, or (j\finally, how badly 
he failed one item compared with how well he passed another. 
The only way to obtain metric relations in data collected by the 
method of single stimuli is to put the information in the data by 
means of a priori statistical assumptions concerning, for exam¬ 
ple, the shape of the distribution function of the abilities of 
the individuals on the attribute in question. A norma) distri¬ 
bution is usually what is assumed in test theory but even this 
is not applied in a thoroughgoing fashion. 

To carry out the assumption fully (i) the percentage passing 
each item should be corrected for chance, then (a) converted 
to a sigma score, and (3) items at equal intervals on this sigma 
scale should be selected for a final form. This procedure is 
usually not rigorously adhered to because, in the first place, it 
makes little practical difference, in many instances, if the items 
are not precisely distributed in a discrete rectangular distribu¬ 
tion on this sigma scale. But there is another reason why it is 
not insisted that this procedure should be rigorously adhered 
to, and that is because the assumptions which lead to a unit 
of measurement implicitly require the further assumption of 
perfect homogeneity. The distrust of the procedure is supported 
by the fact that the assumption of perfect homogeneity can 
usually, if not always, be shown to be violated, eyen in such 
crude data as that collected by the method of single stimuli, 
Unfortunately, to many this is simply regarded as one of the 
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sources of error variance and not as a fundamental theoretical 
obstruction. 

Thus, in the method of single stimuli as applied to mental 
testing we create an interval scale without any built-in or in¬ 
herent test of its validity. Having such a scale, then, it is per¬ 
missible to use certain properties of numbers, and we have 
available a variety of statistical procedures for the analysis 
of behavior. We must, of course, allow for error variance, much 
of which we have put there ourselves in assuming an interval 
scale, and, consequently, a statistical theory of error becomes 
necessary and plays a dominant role in test theory. This, then, 
is one major source of difficulty in the area of tests and measure¬ 
ments but, important as it is, it is not as fundamental as the 
second source. The difficulty arising from assumptions lead¬ 
ing to an interval scale is of significance primarily to the em¬ 
pirical aspect of psychological testing rather than to the theo¬ 
retical aspect. 

The second source of difficulty, which we consider to be of 
prime theoretical significance, has, however, arisen from the 
use of an interval scale. Basically, this second source of confu¬ 
sion is the fact that we have had no fundamental psychological 
rationale underlying our concepts in test theory. Rather, we 
find an easy road to the concepts of test score, difficulty of an 
item, reliability and homogeneity via statistical definitions of 
indices dependent upon the existence of an interval scale. We 
set up these statistical indices based on operational procedures, 
then give names to them and act as if they have certain obvious 
psychological meanings. We have gained readily obtainable 
empirical indices but have paid for them in psychological am¬ 
biguity and imprecise meanings and interpretations. While rela¬ 
tively easy to compute and apparently readily susceptible to 
empirical study, an invalid assumption of an interval scale 
would vitiate even their numerical precision. Thus, we have 
not one but many indices of reliability, each determined in 
a different way, and hence each implying a different meaning. 
We do not have, independently, a quantitative definition of 
the concept of reliability, psychologically derived, with a unique 
interpretation. We have a variety of meanings for the concept 
of reliability, depending upon the index used. It is our thesis 
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that the concept of reliability .should have a unique psycholog¬ 
ical meaning quantitatively defined, ami the various indices 
should then be regarded as different kinds of approximations 
to the concept. The challenge, then, would be to the experi¬ 
menter to devise indices which arc better measures of the con¬ 
cept. 

III. A Psychological Rationale for the Concepts 0/ Reliability 
and Homogeneity 

The Fundamental Kquatioti.-AXe shall now attempt to sketch 
a theoretical psychological foundation for the derivation of 
quantitative definitions of certain concepts of test theory. 

Consider the concept of the difficulty of an item. We all 
have intuitive notions as to what the psychological meaning 
of the difficulty of an item is. It means how hard it is for some 
one to pass it. But we identify the difficulty of an item with 
the percentage of people passing it. We thus have a number to 
represent the difficulty of an item which is the same number 
for all the people in the sample. Yet we know that for some 
people the item was so easy that they passed it, and for others 
it was so difficult that they failed it. It is apparent that we 
must have a definition of the difficulty of an item which will 
permit different values for different jtcopie. Of course, such a 
definition could still permit an outrage difficulty corresponding 
in principle to the conventional definition. 

In order to develop a psychological rationale for the difficulty 
of an item let us consider an arithmetic problem, Lrr this 
arithmetic problem require that an individual know how to 
perform certain operations. The problem might involve addi¬ 
tion and subtraction, the use of log tables, and a certain amount 
of reasoning. Its solution requires a collection of abilities, each 
to a certain degree and combined in a certain way. We may, 
for the sake of simplicity in discussion, lump this particular 
combination of abilities and call it a single ability. The problem 
then requires that every individual possess at least si certain 
amount of this ability in order to solve it. We shall call the 
quantity of an ability required for the solution of a problem 
the 4> value of that problem or that item. 

Shall we regard this ^ value of an item as its difficulty? We 
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might, if we wish, so define the difficulty of the item. But this 
is not psychologically satisfying, because if we ask individuals 
how difficult an item is, some will say that it is easy and some 
will say it is difficult How can the item have one ^ value and 
yet give rise to all this disagreement about its difficulty? Ob¬ 
viously it must be because these different individuals are mak¬ 
ing their judgments from different points of view. A mathe¬ 
matics major says it is easy; a grammar school student says 
it is hard. The point of view depends on the amount of this 
particular ability the person has. Of the particular ability de¬ 
manded by the item, the amount possessed by an individual 
will be designated his C value, representing his capacity. 

We have now a hypothetical continuum on which is a ^ 
value representing the amount of an ability required by the 
item from any individual to whom it is administered, and we 
have also a C value on this same continuum for each individual 
who attempts the item. How, then, shall we represent the 
degree of difficulty that this item has for a particular individ¬ 
ual? This might be done in a number of ways. We have chosen 
to use the ratio of to C to represent the psychological value 
or difficulty of this item for that individual and have called 
this ratio P, and thus we have the simple equation: 

(i) £ - PC 

Obviously, the greater an individual’s capacity the smaller 
proportion of that capacity is required or exercised in solving 
the problem and the easier it appears to him. 

Each time (/z) an individual (/) responds to a stimulus (/) here 
is a set of values which satisfy Qhii = PujChi,. The most fre¬ 
quent objectives of psychological measurement are to determine 
something about the Rvalues of each member of a set of stim¬ 
uli and the C values of each member of a group of individuals. 

But note, and this is significant to our later problem of 
metric, we do not observe Rvalues and C values. Instead, what 
we observe are the P values. Thus, if an individual passes an 
item, we know that on that particular ability the individual’s 
capacity 3 , Ci t , was greater than the quantity 3 , required to 
pass the item and hence the P</ value was less than one. In 


1 The subscript h is one here, 
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the method of single stimuli, which is the method most used 
in mental testing, we can divide the items into two categories 
for each individual, those whose P values were less than one 
for him, and those whose P values were greater than one*. 
From such data on several individuals we want to extract what 
information they contain about and C values. If we refuse 
to make the assumptions which lead to art interval scale, ex¬ 
haustive analysis of these data would yield, at best*, the order 
of the stimuli, (the Rvalues) and the order id the people (their 
C values). 

We might digress for a moment to point out that with other 
methods of collecting data, such as the method of rank order, 
the method of paired comparisons, and the* method of triads, 
we are able to collect, successively, much more information 
about the P values of stimuli for each individual and hence 
learn more about ^ values and C values than we do from the 
method of single stimuli used in mental testing, Curiously 
enough it appears that we are going to Ik* able to go further, 
with fewer assumptions, in the area of so-called qualitative 
attributes than in the area of mental testing. 

Phe Variance oj an Individual's Score.- Imagine now that 
we have a stimulus or test item and a group of individuals who 
respond to it. Each individual’s rcsjxmsc to the item provides 
a P value, Of course we do not know the exact magnitude of a 
P value, we know only whether it is less than one or greater 
than one, that is, whether the individual passed or failed the 
item. But this is a limitation of this method of collecting 
data Let us imagine that we had a method which would give 
us the exact P values. There would be, then, a distribution of 
P values for the stimulus. This distribution represents the 
distribution of difficulties which the item has for the individuals 
in the group, 

Each individual has one of the P values in this distribution. 
Let us imagine that we could again administer this item to this 
same group of individuals independently* of its previous ad- 

, ‘.We have nvoided the complication introduced by the trur ubr and multiple* 
choice type of item m which an individual may get an item right by pure chance, 
i ,!5 l ! no n ,® . ' or t " 5 complication from the point of view of otmatruf etna » theory, 
•Ihe conditions necessary are that %*</ be constant over h and i and the Cut be 
constant over h and j, For purposes of future generalization these constitute an extrema 
of class 1 conditions ft). 

’ Experimental independence. 
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ministration. Then, once again, each individual would have a 
P value for this item. Would the successive P values of an 
individual for the one stimulus be identical, even if the suc¬ 
cessive administrations were independent? This is a question 
of whether or not Phi, is constant over h for a given i and j 
and can only be answered by experiment. It might well be 
that in the case of one attribute, say arithmetic, these succes¬ 
sive P values would be almost constant for any given individual, 
whereas in the case of another attribute, say the aesthetic 
merit of a painting, the P values might be greatly variable. 
In this latter case we would expect the P values to be variable 
if the individual was not too clear as to just what he meant by 
aesthetic merit and hence used different criteria in successive 
evaluations of the painting. Thus, if the continuum is in¬ 
trinsically different at different times, both the ^ values of 
the stimulus and the C values of the individual would be varia¬ 
ble for the same nominal trait, like aesthetic merit, because the 
exact composition of the trait was variable. 

We have conceived, now, of each individual in a group hav¬ 
ing responded a number of times to a stimulus and, hence, for 
each individual, i, there is a distribution of P hii values for the 
stimulus_/. Let us now do the same thing for more stimuli, and 
imagine that there is for every individual a small distribution 
of his P values for each stimulus within the total distribution 
of all individuals’ P values for each stimulus. The notation 
used is as follows: 

h = i, i, • • • /, (the number of times an individual responds to 
a stimulus) 

i = i, a, • * ■ N } (the number of individuals) 
j = 1,2, • ■ • w, (the number of stimuli) 

Pa “ 7 ? P\u 

Pt " i 5 ? Pmi 

Pi ~ 4 ? ^ P^i 

p = jrL 2 23 £ p hii 

Nut i i h 
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We are now in a position to define the status score, S t (a), 
of an individual as follows: 

(a) (Pt ~ 

or 

(3) S t - P ~ P« 

To put the status score of an individual in words, it is de¬ 
fined as the average difficulty of all the items for all individuals 
minus the average difficulty of all the items for him alone Thus, 
we have made the score of the individual dependent upon the 
composition of the group of individuals of which he is a member. 
On this scale the average individual has a score of zero, and 
the better the individual the higher his score, since the easier 
the items are for an individual the smaller the ptoftortion of 
his capacity is required to pass them and the larger would be 
S ( . Individuals below average would have negative status 
scores. 

Inasmuch as, in principle, an individual has a score, an A'„ on 
every item every time he takes it, let us consider the composi¬ 
tion of the variance of all these ‘'scores” that get averaged 
together for a total score. If we designate by t\ the total vari¬ 
ance of an individual, we have 

( 4 ) - i ? ? (Pi - PkuY - «5\* 

By adding and subtracting Pa inside the parentheses, expand¬ 
ing and collecting terms, the expression for l\ becomes: 

(5) ^ ? (P<i *~ •Pag) 9 + ^ (P, ~ P tl Y 

- £ ? O’, - /’.,)]« 

Making the following definitions, 

(6) D.■ (p„ _ p„ () , 

( 7 ) r,< -1 2 (p, - p„)> - £ T. ( p, _ p„)|. 

we have 
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( 8 ) r t = zv + 27 

and Vx is seen to have two components. These two components; 

D, and T ( , are of psychological significance. The first com¬ 
ponent, D„ we call the individual’s dispersion score and it 
represents the variability within an individual in repeatedly 
responding (independently) to the same stimulus, summed over 
all the stimuli. D; reflects an individual’s internal consistency 
in responding repeatedly to the same stimuli. The contribution 
that is made to this component by each stimulus is essentially 
the precision of the individual’s score on each item, and when 
summed over the items is a measure of the precision of the 
individual’s total score on the test, 

The 'Ti component describes the variability of the individual’s 
mean position within the group as the group passes from stimu¬ 
lus to stimulus. We call this score the individual’s trait score. 

Thus, we now have two concepts to represent the hypo¬ 
thetical behavior of an individual in response to repeated inde¬ 
pendent presentations of a set of items. We have the concept 
of a dispersion score which represents the precision of an indi¬ 
vidual’s final total score on the test. And we have the concept 
of trait score which represents the stability of an individual’s 
position within the group in passing from item to item. 

Reliability and Homogeneity .—We shall now identify D, and 
with the concepts of reliability and homogeneity, respec¬ 
tively, We have here precise definitions of concepts from a 
psychological rationale such that the concepts may be manipu¬ 
lated mathematically and are susceptible to rigorous logic. 

We shall use the terms D<, dispersion score, precision, and 
reliability interchangeably; and the terms trait score, and 
homogeneity interchangeably First, it is apparent from the 
mathematical definition of the concept of precision that it is a 
characteristic of an individual’s behavior on the items compris¬ 
ing the test, and does not necessarily have the same value for 
every individual who takes a particular test. To put this in the 
more common terms of test theory,, the reliabifky-o£-a. teajLO£» 
as we define it, the precision of an ifl'ffl'fi 4 l t(!al'? f^pTs|t^^,pi0^yrp 
be different for every individual who t|i|<$q Atbf 

approximation of unknown degree o assign the same coefficient 

Unit (N.C.E R.T ) 

Acc No 2-737 

1 I )xta 
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to all individuals. This approximation, perhaps, would be 
reasonably close in the case of some mental tests, but in others 
the individual differences in D, might lie considerable. 

The relation between reliability and homogeneity is an inter¬ 
esting one. In principle we could construct a test which would 
have high precision, or reliability, and such that the items would 
have zero intercorrelations, or, for that matter, any values 
from plus one to minus one. Thus, if a man’s score on one item 
was the number of children he has and cm another item his 
cephalic index, and on a third item the number of dubs and 
societies he belongs to, his total score would have very high 
reliability. It does not necessarily follow, however, that the 
score means anything—that it represents a point on a con¬ 
tinuum which is a psychological trait continuum. Obviously, 
then, the fact that one has high precision for a test score has 
no bearing on whether or not one is measuring some kind of 
meaningful psychological entity, If one takes a number of 
things which are qualitatively different and adds up the scores 
on these different things for each individual, then tile total 
scores will be a set of numbers which may have the property 
of precision but will have no common quality, 

Let us turn now to the trait score which we identify with 
homogeneity. This denotes the stability of an individual's posi¬ 
tion within a group, Such a measure would not be an exclusive 
property of an individual, as in the case of precision, but is a 
property of the group as a whole on the test, and hence f, 
should be averaged over the individuals. 

The significance of this concept lies in its indicating the 
degree to which the final total scores of individuals have some 
common quality or represent a psychological entity for the 
group. The expression for the trait score, ¥, t averaged over 
individuals, is essentially equivalent to the notion of correlation 
between items, except that it is expressed in terms of variance 
rather than correlation or covariance. 7 

Thu 3 . we have a test consisting of a number of items, each 

7 Another way of looking at and T*i U by analogy with error variance and true 
variance in conventionnl teat theory, The analogy between and em*r variance m 
justified. But Pj is a variance generated by lack of homogeneity among the hem** 
Hence, m the sense used here, the “true variance" would represent the degree to 
which the items failed to constitute an organized and integrated common trait. 
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from a different primary mental ability, we would expect the 
position of the individual within the group from item to item 
to be variable. This is on the premise that there are intra¬ 
individual differences in ability. On the other hand, if the test 
were a set of arithmetic items then the position of the individual 
within the group as it passed from item to item would probably 
be relatively stable and there would be a high degree of homo¬ 
geneity. These two tests might well have equally high reliability 
but quite different homogeneities 

In principle, the two components D< and fT,' are independent 
and it is not difficult to imagine a test with perfect precision 
for all individuals, or perfect reliability, and with a degree of 
homogeneity anywhere from 7.ero to perfect. On the other hand, 
in a probability sense, it would perhaps be much more difficult 
to construct a test with perfect homogeneity but with low pre¬ 
cision. Such a relation is implicit in the reasoning behind the 
attempt to increase the reliability of a test by means of an item 
analysis against an internal criterion. 

Indices .—We have reached a point now where we must con¬ 
sider again the distinction between the defined meaning of a 
concept and the index which presumably is a measure of the 
concept. What we have tried to do is to provide meaningful 
definitions of the concepts of precision and homogeneity but we 
have not provided an index for either one of these concepts. An 
index is simply a method of analyzing data to get certain infor¬ 
mation. Hence, in order to compute a meaningful index, the 
data must contain this information. Consider, for example, 
what is required of the data so that they will contain informa¬ 
tion about the precision of an individual’s score. We can see 
that to get a measure of precision, that is, to compute an indi¬ 
vidual’s dispersion score, requires repeated independent re¬ 
sponses from him to the same item. The method of single 
stimuli conventionally used in mental testing does not provide 
such observations. Thus, it appears that with conventional 
testing methods an index of the reliability of a test score is 
indeterminate and there is no valid formula for reliability. On 
the other hand, the T t component of an individual’s total 
variance requires only one observation per individual per stimu¬ 
lus and, hence, data collected by the method of single stimuli 
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do contain information pertaining to the concept of homo¬ 
geneity. But samples of size one are poor estimates of the mean 
of a distribution. Nevertheless, they can be used to get an esti¬ 
mate of the variance between distributions which is, however, 
contaminated by the variance within the distributions. The two 
components, D ( and T <, of the total variance cannot be sepa¬ 
rated in data collected by the method of single stimuli. In other 
areas, a method for collecting data like the method of paired 
comparisons or the method of triads does provide information 
pertaining to both components and it is possible in principle to 
measure them both. 

Essentially, what we have done is to give the quantitative 
definition of concepts based on a psychological rationale prece¬ 
dence over the statistical procedure of computing an index and 
then arguing about what the index means. We have chosen to 
have meaningful concepts and to recognize that our measures 
of them are inadequate and approximate rather than to take 
the measures as experimental facts and try to give them psycho¬ 
logical meaning with consequent ambiguity and controversy. 

What is it, then, that we do get from our indices of reliability 
or homogeneity? It is apparent that we can have no clear index 
of either the precision of a test score or the homogeneity of a 
test from conventional testing methods. Every index designed 
to represent one or the other actually represents a joint effect. 
The various indices merely differ in the nature of their approxi¬ 
mation, then, to V h the left hand side of equation (B), summed 
over all individuals. 

Inasmuch as this Vt is also the variance of an individual’s 
score just as one of its components, D<, is, one might ask what 
the difference is between them. The difference is that D n the 
variability within an individual, is the degree of precision of a 
score on the test. V{, the left hand side of the equation, is the 
precision of the individual’s score on the attribute, the domain 
which the sample of items represents. Obviously, the homo¬ 
geneity of the items in a test has nothing to do with the preci¬ 
sion of a score on the test. But, obviously, this same score, 
when regarded as an estimate of the individual's score on the 
domain or attribute of which the items constitute a sample, is 
dependent upon the homogeneity of the domain. The greater 
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the homogeneity of the domain, the more alike will be the 
scores of an individual on successive samples of items from that 
domain. 


IV. Next Steps 

As we see some of the implications of this for the further 
development of Lest theory, there appear to be three general 
alternatives, the first of which has two sub-alternatives: 

i. Continue with the method of single stimuli as a method 
of collecting data. Then we can do one of two things: (a) make 
the necessary assumptions to achieve an interval scale and 
hence have numbers to manipulate, 8 or (b) drop the assump¬ 
tions which lead to an interval scale and substitute Lazarsfeld’s 
latent structure analysis (4). The first sub-alternative above is 
to continue in the conventional manner. This will permit easily 
accomplished empirical studies in which we could rarely have 
firm confidence and unambiguous interpretation. The second 
sub-alternative requires going in an entirely new direction. Laz¬ 
arsfeld’s latent structure analysis is a non-metric theory for 
the scaling of data collected by the method of single stimuli. 
Obviously, his theory could be taken over bodily by test 
theorists, although from a practical point of view there are 
still computational hurdles. Such difficulties, however, are mere 
mechanical limitations and are not defects of the theory. 

a. A second general alternative is to discover or to develop 
a new method for collecting data which would enable us to put 
the items in rank order for each individual as to how well he 
passed them and how badly he failed them, If we could collect 
such data we would then have data which, with very simple 
assumptions, contain information about metric relations be¬ 
tween stimuli and individuals (1). 

3. A third alternative is to discover or to develop a new 
method for collecting data which would be equivalent to the 
method of paired comparisons. This would require repeated 
independent responses to each stimulus. Such data would con¬ 
tain information on the metric relations between stimuli and 
individuals, and, in addition, information on the two compo- 

better sub-alternative here is to experimentally validate the assumptions of 
an interval scale if this is possible. 
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nents of precision and homogeneity, making a precise distinction 
between them possible. 


V. Summary 

We have tried to show that the assumptions required for an 
interval scale and the identification of indices with concepts 
are serious obstacles to the further development of test theory. 
We have then developed a rational basis for defining the diffi¬ 
culty of a test item for an individual and, from this basis, 
developed mathematical expressions for the concepts of relia¬ 
bility and homogeneity. It was then made apparent that the 
measurement of reliability and homogeneity from the analysis 
of data collected by the method of single stimuli is not possible, 
as such data do not contain the necessary information, Several 
alternative directions for the further development of test theory 
are pointed out 
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PROBLEMS IN MEASURING THE EFFECTIVENESS 
OF PROFESSIONAL EDUCATION 


DONALD K. BECKLEY 
Simmons College 

In the process of completing a recent study of the effective¬ 
ness of one area of professional education, a number of prob¬ 
lems arose that may well be of interest to others planning in¬ 
vestigations of a similar nature. For this reason, this article 
has been prepared to describe some of these problems and the 
methods by which they were met. The study concerned was 
made to ascertain the effectiveness of college training for ex¬ 
ecutives in retailing in terms of selected objectives determined 
to be desirable. To do this, the performance of retailing gradu¬ 
ates in respect to these objectives was measured by means of 
an achievement examination and compared with the perform¬ 
ance of other groups. 

Selecting Groups for Comparison 

A question arose at this point of what groups to use for pur¬ 
poses of comparison. It was recognized that, as in other areas 
of professional and vocational education, objectives thought 
to be desirable might very possibly be attained by means of 
work experience as well as through formal college training. A 
study of this nature could be helpful in identifying those ob¬ 
jectives that could best be taught by means of formal college 
training and those for which work experience itself was best 
suited. 

In appraising the effectiveness of formal training, it was thus 
necessary to take into consideration both formal training and 
work experience as factors to be measured in respect to achieve¬ 
ment of the selected objectives. In order to have these two fac¬ 
tors appear in all possible combinations, it was necessary to 
find subjects in each of these four groups: (i) no training, no 
work experience, (a) training, no work experience, (3) work 

57 
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experience, no training, and (4$ both training and work ex¬ 
perience. Subject groups who met thece requirements were 
obtained through the use of these categories, in which the dis¬ 
tinguishing characteristics are the presence or absence of the 
two factors: 

1. Incoming students at the Simmons College Prince School 
of Retailing who have neither studied retailing in format 
courses nor had extensive work experience. 

2, Students who have completed the course in retailing at 
the Prince School of Retailing, but have not yet had ex¬ 
tensive work experience. 

3, Employees in Boston stores who arc in positions of the 
kind graduates soon will be taking, but who have had no 
formal retail training. 

4. Store executives and junior executives who have had a 
specified amount of store experience and also are gradu¬ 
ates of the Prince School of Retailing, 

Because two programs of retail training arc offered at Sim¬ 
mons College where this study was made, it seemed appro¬ 
priate also to consider educational level as another factor. 
Hence, within each of the four groups were two sub groups, 
one consisting of students who had completed a four-year 
undergraduate college liberal arts program, and the other in¬ 
cluding those who had spent only two years in liberal arts 
study before beginning their retail training. 

The purpose of the study, then, was to determine the strength 
of these three factors: (i) formal retail training, (2) retail- 
work experience, and (3) under-graduate-college education in 
respect to achievement of selected retailing objectives. The 
hypothesis to be applied was that groups initially comparable 
in all respects but differing in their treatment should reflect 
differences in achievement that arc the result of that particular 
treatment. 

The nature of the experiment can best be indicated by ar¬ 
ranging the data in the following design: 

No experience 

No training Training 

J yre. coll. tyre. coll. 3 yn. coll 4yr*.«ill. 

N = 36 N =» 30 jV «* 29 /V - iH 

Experience 

N = 29 N = 3a N « la N » 10 
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The basic test-score data are presented in Tables I and 2., 
in which the letters refer as follows • (a) no training, no work 
experience, (b) training, no work experience, (c) work experi¬ 
ence, no training, and (d) both training and work experience. 
Numeral i refers to students with four years of college prepara¬ 
tion, and numeral i refers to students with two years of college. 

It was recognized that any statistical design selected could 
not be adequately precise when uncontrolled variables still 
remained. In this study, intelligence of the subject was meas- 


TABLE 1 

Means aj Scores on Retailing Examination 


Group 

JV 

Total 

; 

Test Scores 

II III 

IV 

V 

R-I 

3 ° 

41 47 

9-13 

8.07 

9 07 

8.77 

3.80 

b-l 

18 

59 57 

12.72 

II .54 

14-39 

13-50 

7.41 

C-I 

32 

50.01 

n.75 

9-34 

u.ai 

IO. 4 I 

7.91 

d-i 

IO 

60.10 

12,70 

11.30 

14.00 

12.70 

8.80 

a-a 

36 

36.41 

IO.OI 

7.86 

7.67 

8.39 

5.06 

b -2 

29 

56.17 

13-13 

9-54 

14-17 

n.34 

7.8a 

c-a 

29 

44-31 

11.03 

7.58 

10.14 

8 62 

6.38 

d-a 

12 

50.25 

II .66 

8-33 

12-33 

”•33 

5-58 




TABLE 

2 





Standard Dentations oj Scores on Retailing Examination 


Group 

n 

1 

It 

Test Scores 
III 

IV 

V 

tt-I 

3 ° 

2.11 

2.02 

2.02 

3-07 

2.21 

b-i 

28 

”43 

T .68 

1.97 

2.23 

1.86 

C-I 

32 

2.09 

”73 

2,81 

3-09 

2.95 

d-i 

IO 

1.62 

2.05 

2.16 

2.61 

”54 

a-a 

36 

1.62 

2.46 

2.26 

2.24 

2.09 

b-2 

29 

1.18 

2.20 

1.69 

2,19 

”53 

C -2 

29 

1.71 

2.40 

2.90 

1,47 

2-59 

d-a 

12 

2.09 

4-13 

1.95 

”57 

2.69 


ured by the Wonderlic Personnel 'Pest, and sex differences were 
eliminated by having only women as the subjects. Recognizing 
that the age levels of the two sub-groups differ by several 
years by definition, calculation of critical ratios indicated that 
none of the differences between the means of the various groups 
were significant, thus minimizing age as a factor here. 

Selecting Subjects for Administration 

Some difficulties were encountered in obtaining an adequate 
sample of subjects in all of the groups. Categories i and a 
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consisted of incoming and outgoing students, hence they were 
readily available to take the retailing examination. Through 
the cooperation of a large Boston department store, a com* 
parable number of subjects in group j was made available. 
In the case of group 4, with both formal training and work 
experience, obtaining subjects was more difficult. In order to 
have work experience comparable in amount and degree to 
that of subjects in group 3, it was necessary to select graduates 
of the School who had been working far approximately one to 
two years. Because of the small number of graduates with 
this amount of experience, the total number of possible subjects 
was definitely limited. A practical difficulty faced here was that 
most of the 34 eligible subjects lived away from Boston, and, 
in fact, covered most parts of the United States. It was not prac¬ 
ticable to talk with them in person, cm to administer the ex¬ 
amination personally, as was done with the other groups, and 
the only feasible method of reaching them was by mail, A 
letter was sent to each of these people requesting her assist¬ 
ance and enclosing the examination materials together with 
detailed directions as to the procedures to be followed. A follow¬ 
up card was sent to those who did not return the completed 
materials by the date suggested, and the final return consisted 
of 22 cases. 

Because of the nature of the questions asked in the retailing 
examination, it seemed unlikely that more than a few of the 84 
objective questions could be answered readily through the use 
of notes or texts. In view of the explanation that the average 
scores of each group rather than individual scores were of in¬ 
terest in the investigation, it further seemed unlikely that any 
of the subjects would have sought to use outside help in an¬ 
swering the questions. In the case of the IVonderllc Personnel 
Pest there was the question of whether or not the subjects 
had adhered to the specified time limit. Each score was checked 
in terms of the subjects’ previous academic performance as a 
student, and any earlier intelligence scores available. In the 
case of two subjects whose earlier record did not seem to justify 
the very high intelligence scores received, deductions were 
made arbitrarily to make their scores approximate the mean 
of the group excluding these two scores, where they would not 
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influence the group computations. In all other cases, the scores 
received appeared by inspection of the records available to be 
entirely probable, and lienee were accepted as having been 
done under the conditions specified. 


Checking the Reliability of the Examination 

When the examination in retailing had been constructed, 
the question arose as to what measure to use in determining its 
reliability. This question might better have been considered 
before rather than after the examination was made. Because 
of the conditions under which this examination in retailing was 
developed and was to be administered, it was not feasible to 
measure reliability through the use either of a retest or of equiv¬ 
alent forms. Thus, it appeared that some use of the split-half 
technique or application of the Kuder-Richardson formulae 
was appropriate here. Originally the split-half technique was 
rejected because the examination had not been properly 
planned for the measurement of reliability, and there would 
have been an item discarded from each of several sub-groups 
when the odd and even items were matched. The Kuder- 
Richardson formula number to , which gives an estimate of the 
reliability of a test when the numbers of items, the standard 
deviation, and the average variation of the items are known, 1 
has been described as superior to coefficients obtained by the 
split-half method, because any error due to bias in splitting a 
test is eliminated. 1 

Because the examination in retailing was divided into five 
sets of items representing the five objectives being measured, 
it was desirable to estimate reliability coefficients for each 
objective separately. Similarly, the four groups to whom the 
examination was administered were different, and also were 
treated separately. Except for group i, students who were 
tested at the time they were finishing their course in retailing 
and thus a highly homogeneous group in respect to test per¬ 
formance, all groups had reliability coefficients ranging be- 


1 Sec Kuder, G, F. and Richardson, M. W, "The Theory of the Estimation of Test 
Reliability," Psychotnchska, II (19.17), page 158. 

’See Jackson, R. W. B. and Ferguson, G. A. Studies on \he Reliability oj Teels, 
Bulletin No. 1 a, Dept, of Educational Research. Toronto. University of Toronto. 
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tween .514 and .900. When reliability was calculated by the 
split-half method in spite of the objection mentioned earlier, 
the coefficients for group 2 were shown to be higher than 
originally calculated, and within the range indicated for the 
other groups, thus leading to the conclusion that the examina¬ 
tion was adequately reliable for group use. 

Planning an Experimental Design 

Perhaps the most important problem in undertaking a statis¬ 
tical study is the selection of an experimental design with a suf¬ 
ficiently high degree of precision to answer the questions de¬ 
sired. The problem here was to select a design to indicate 
whether or not differences in gains among groups of students 
were greater than would lie expected from the operation of 
chance factors alone. 

A technique often used in investigations such as this is the 
matching of pairs. It would have been jvossible to match pairs of 
cases within each pair of groups in this experiment, but the un¬ 
equal number of cases would have proved to be a disadvantage 
in that many cases in the larger groups would be left over after 
pairs were matched. 

One technique appropriate for use in this type of experiment 
is the analysis of variance. As described by Lindquist, 8 the 
variance of a sample can be analyzed into two components: the 
within-groups variance and the between-groups variance. If 
the hypothesis of random sampling is correct, the two estimates 
of variance would normally differ only by chance. The K test, 
known also as the variance ratio, indicates at the desired level 
of significance whether or not the estimated variances are larger 
than chance. If so, there is reason to believe the hypothesis to 
be false. 

In this experiment, however, it seemed especially desirable 
to ascertain the strength of the relationship among the factors. 
This measurement was not available through the use of analysis 
of variance, and the Peters’ regression technique was used. The 
covariance technique could be used to account for the initial 
lack of equivalence of groups and also in estimating the relia- 

MiKompanr; Educaltanal *««**• Houghton 
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bility of differences between the adjusted final means. The 
Peters' technique seemed preferable, however, because it pro¬ 
vided an index of the strength of the relationship comparable to 
a coefficient of correlation. This involved the matching of the 
experimental and control groups through use of a regression 
technique which does not require pair-by-pair matching. This 
treatment made it possible to know whether the three experi¬ 
mental groups did better on the achievement examination in re¬ 
tailing than would be expected in view of their intelligence test 
scores. The hypothesis tested here was that there were no real 
differences produced by the factors introduced, and that any 
differences in final mean scores, after allowances had been made 
for chance differences in initial mean scores, were due entirely 
to chance fluctuations in random sampling, 4 
This technique has been described by Peters 6 as follows: 

The method involves settingup a regression equation in rec¬ 
tilinear form based on the statistics of the control group, then 
predicting by it what should be the achievement scores of the 
members of the experimental group if they were just like the 
control group members; if, that is, the experimental factor pro¬ 
duced no differential effect. We can, then, determine the dif¬ 
ferential effect for the experimental factor by the extent to 
which the average achievement of the experimental group ex¬ 
ceeded or fell short of that predicted for it by the regression 
equation. 

While similar in many respects to Fisher’s covariance tech¬ 
nique, the Peters’ technique makes the regression equation from 
the statistics of the control group rather than from the experi¬ 
mental and control groups pooled, on the ground that a pooled 
estimate would be a meaningless hybrid if the two groups dif¬ 
fered by reason of the experimental factor, as probably would 
be the case. 6 

The use of the regression technique is especially appropriate 
here, since it is recognized that there is a positive correlation be¬ 
tween academic aptitude or intelligence, particularly verbal 
ability, and scores on the retailing examination. In this experi- 

1 Ibid., p. 181, 

s Peters, C. C, "A Method of Matching Groups for Experiment with no Loss of 
Population." Journal of Educational Research, XXXIV (1040), 70-74. 

• Peters, C, C. el af. "Research Methods and Designs." Review 0] Educational Rc~ 
search, XV (1945), 377 - 393 - 
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ment, a regression equation was calculated from the scores ob¬ 
tained on an intelligence test ami the retailing examination by 
the control group with neither retail training nor work experi¬ 
ence. This equation was then used to predict the retailing exami¬ 
nation score from the intelligence-test score for those in each of 
the three other groups, which were regarded as experimental 
groups. This predicted score was then compared with the actual 
score of each case in the experimental groups, and the signifi¬ 
cance of the difference between means of predicted and actual 
scores was tested for each objective in each sub-group separately. 

A major problem in this connection concerned the standard 
error formula appropriate for use here. In this situation the 
different numbers of cases in the control and experimental 
groups do not affect the standard error formula, but the dif¬ 
ference in the means of the matching scores of the control and 
experimental groups requires an adjustment for that difference. 
Thus, instead of the conventional formula for calculating the 
standard error of the difference between means, a special 
formula as stated by Peters and Van Vworlds 1 must be used 
because the groups are not perfectly equated on the basis of 
the matching factors. The differences between the means at¬ 
tained in the various tests were then divided by the standard 
errors of the differences in order to determine the t-ratios. 

The Peters’ regression technique, described above, served to 
indicate clearly the level of significance of the mean differences 
in achievement scores when the groups were equated for in¬ 
telligence, but they did not identify the relative strength of the 
factors being measured. The problem thus arose of how to 
measure the magnitude of the relationship between achievement 
in retailing and the several factors to be isolated; retail training, 
work experience, and college education. Some measure of corre¬ 
lation was needed here to indicate the strength of relationship 
between achievement and each of these factors with the other 
factors held constant. 

The Kelley correlation ratio, e, was found to be an appro¬ 
priate statistical treatment for this purpose, particularly be¬ 
cause it is not affected by disproportionate numbers of cases 

’ Peters, C. C. and Van Voorhis, W, R. Statistical Procedures and Their Math*’ 
mutual Bases. New York: McGraw-Hill Book Company, 1940, 



MEASURING EFFECTIVENESS OF PROFESSIONAL EDUCATION 65 

in the various groups. As described by Peters and Van Voorhis, 8 
when corrected, c has a standard meaning free from bias and 
independent of the size of the population of the sample and of 
the number of classes into which the sample is divided. It has 
been shown to have all the merits of analysis of variance, and, 
in addition, is interpreted positively rather than negatively, as 
in the case of the t- and F’-scores involving the null hypothesis. 

A problem, however, was how to set up the data in this study 
to make possible meaningful analysis. One plan used was to set 
up direct comparisons of various pairs of subject groups in order 
to isolate each of the three factors to be measured. For example, 
to isolate the factor of formal training, group 1 (no work, no 
training) was compared with group 1 (no work, training); and 
group 3 (work, no training) was compared with group 4 (work, 
training). By this kind of classification, direct comparisons were 
made between various pairs of groups, thus holding constant 
the factor present in or absent from both groups. 

Although useful to some extent in measuring strength of re¬ 
lationship of the various factors, the c treatment described 
above was not entirely satisfactory, and some further classifica¬ 
tion was sought whereby two of the three factors to be measured 
could be isolated simultaneously while the strength of the third 
factor was being measured. As described by Peters and Van 
Voorhis, 9 there is a technique through which subjects can be 
sorted into classes on the basis of some known factor, and then 
subsorted into sub-classes. The variance of these sub-classes 
will be due to factors other than those which determine the class 
sorting. This treatment, which, in effect, is partial e, was used 
in the study being described. Because two factors were to be 
held constant, it was necessary to sub-subsort the data. For 
example, to find the partial e for education on achievement, 
with training and work experience held constant, the following 
classification was made; 


Work Emmlenco j 

No Work Eiporienca 

Training 

No Training 

Training 

No trainlns 

ayrs.c. 4yrs. c. 

ayrs.c. 4yrs.c. 

ayrs.c.^yrs. c. 

ayrs.c. 4yrs.c, 
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Through the calculation of corrected f, it was possible to com¬ 
pare directly the strength of the three factors, and thus to have 
some statistical basis for noting the relative importance of these 
factors in respect to each of the objectives being measured. 

The conclusions reached from this study were as follows: 

I. The theory of retail training proposed that work experi¬ 
ence alone can be more effective than formal training alone in 
teaching specific job techniques was not substantiated in respect 
to the objective: cultivation of skills in the use of retailing math¬ 
ematics Formal training alone was found to he approximately 
equal in effectiveness to work experience alone in this area. 

i. The presumption that the combination of formal training 
and work experience together would prove more effective than 
either training or work experience alone was not consistently 
borne out, possibly because of limitations in the size of the 
sample studied. Although subjects in this group performed sig¬ 
nificantly better than the control group in the case of all but 
two sub-groups, these subjects did not consistently show sig¬ 
nificantly greater differences as compared with subjects with 
training or work experience alone. Many of the subjects tested 
had been working since graduation in personnel positions which 
did not directly involve customer contact or the use of mer¬ 
chandising mathematics, and the data suggest that as with 
training in other fields, people remember best those kinds of 
learning with which they are most directly interested or em¬ 
ployed. 

3. Of the five objectives measured, work experience was 
shown to be relatively the most effective in: (1) skill in the use 
of retailing mathematics, and (2) identification of retailing 
facts. Work experience was least effective in teaching the com¬ 
prehension of the nature of distribution. As indicated above, 
work experience equalled formal training in effectiveness only 
in respect to skill in the use of retailing mathematics, 

4. Subjects with four years of libcral-arts-college education 
were better prepared to be effective retail executives than those 
subjects with two years of liberal-arts-college work, except in 
the case of the objective: application of principles of retail 
management, where no significant relationship exists. 



THE CONCEPT OF VALIDITY IN THE INTERPRE¬ 
TATION OF TEST SCORES 


ANNE ANASTASI 
Fordham University 

If asked to define ‘Validity,” most psychologists would prob¬ 
ably agree that validity is the closeness of agreement of a test 
with some independently observed criterion of the behavior 
under consideration. It is only as a measure of a specifically 
defined criterion that a test can be objectively validated at 
all. For example, unless we define “intelligence” as that com¬ 
bination of aptitudes required for successful school achieve¬ 
ment, or for survival on a certain type of job, or in terms of 
some other observable criterion, we can never either prove 
or disprove that a particular test is a valid measure of “intelli¬ 
gence.” The criterion may be expressed in very broad and 
general terms, such as “those behavior characteristics in which 
older children in our culture differ from younger children reared 
in the same culture,” but, however expressed, it defines the 
functions measured by the particular test. To claim that a 
test measures anything over and above its criterion is pure 
speculation of the type that is not amenable to verification 
and hence falls outside the realm of experimental science. 

To the question, “What does this test measure?”, the only 
defensible answer can thus be that it measures a sample of 
behavior which in turn may be diagnostic of the criterion 
or criteria against which the particular test was validated. 
Nor is there any circularity implicit in such a definition of 
validity, since a psychological test is a device for determining 
within a relatively short period of time what could otherwise 
be discovered only by means of a prolonged follow-up. For 
example, with a psychological test we may be able to predict 
within a certain margin of error which applicants will succeed 
on a given job or which students will be able to complete 
a medical course satisfactorily. Logically, the same information 

6 7 
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could have been obtained, even more precisely, by hiring all 
job applicants or admitting to medical school all students 
wishing to enroll, and observing the subsequent performance 
of each subject. The latter procedure is obviously so time- 
consuming and wasteful, however, as to be completely im¬ 
practicable. Hence the tests make a real contribution in per¬ 
mitting predictions in advance of lengthy observations. Another 
advantage of standardized psychological tests is that they make 
possible a comparison of the individuals performance with 
that of other persons who have been observed in the same 
sample situation represented by each test. In other words, 
the tests provide norms for evaluating individual performance. 

Prediction and comparison with norms represent valuable 
contributions which psychological tests can render to our knowl¬ 
edge of individual behavior, the practical benefits of these 
contributions having been widely demonstrated. It is of funda¬ 
mental importance, however, to bear in mind that psychological 
tests do not provide a different kind of information from that 
obtained by any other observation of behavior. The use of 
such labels as “intelligence,” “aptitude,” “capacity,” and "po¬ 
tentiality” has probably done much to make test users lose 
sight of the empirical validation of tests. A number of current 
disagreements regarding the interpretation of test results and 
the susceptibility of tested abilities to training may be trace¬ 
able to a failure to take due cognizance of validation procedures. 
Many test users apparently give only preliminary and possibly 
perfunctory attention to validation data, in order to reassure 
themselves at the outset that the test is “satisfactory.” Their 
interpretation of the scores obtained with such a test, however, 
often takes no account of the validation data and is expressed 
in terms which bear little or no relation to the criterion. 

Perhaps one of the most common examples of such an in¬ 
consistent treatment of test validity is provided by what we 
may call the argument of “extenuating circumstances.” Let 
us suppose that a child obtains an IQ of 58 on a verbal intelli¬ 
gence test, and that the examiner subsequently finds evidence 
of a fairly severe language handicap in this child owing to 
foreign parentage. It is a common practice to conclude in 
such a case that the obtained IQ is not “valid,” on the grounds 
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that the verbal content of the test rendered it unsuitable for 
testing such an individual. At this point we may inquire, 
however, “On the basis of niiat criterion is this IQ invalid?” 
Certainly the obtained IQ may be a valid measure of the 
behavior defined by the criterion against which the particular 
test was validated. It is very likely that the same language 
handicap which interfered with performance on this test will 
interfere with the child’s behavior in other linguistic situations 
of which this test is an adequate index. The correspondence 
with the criterion may thus be just as close for this child as 
for children without a language handicap. In school, for ex¬ 
ample, the language handicap would probably interfere with 
the child’s acquisition of important skills and information. 
The resulting academic backwardness, together with the origi¬ 
nal language handicap itself, would, in turn, affect certain 
aspects of job performance and other areas of adult activities. 
Conversely, any remedial efforts designed to eliminate the 
language handicap would produce an improvement, not only 
in the tested IQ, but also in the broader area of behavior 
of which this test is a predictor. 

It should be added parenthetically that language handicap 
has been chosen as an example only for purposes of discussion, 
A number of other ”extenuating circumstances,” such as visual 
or auditory defects, emotional and motivational factors, in¬ 
adequate schooling, and the like, could have served equally 
well to illustrate the point. Similarly, the discussion has been 
limited to intelligence tests, since it is chiefly in connection 
with these tests that many confusions regarding validity have 
arisen. The entire discussion applies equally well, however, 
to all types of psychological tests. 

Specifically, how does the case cited in our illustration, as 
well as others of its type, differ from those in which no question 
is raised regarding the "validity” of the test or its applicability 
to the particular individual? First, in the present case the 
examiner has direct and certain knowledge regarding at least 
one of the factors which determine the subject’s subnormal 
performance, viz., language handicap. In other cases, the prin¬ 
cipal determining factor might be inferior schooling facilities, 
parental illiteracy, cerebral birth injuries, a defective thyroid, 
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or any of a large number of psychological or biological condi¬ 
tions. Yet it is doubtful whether the IQ would be considered 
“invalid” in all of these cases simply because it proved possible 
to point to a specific condition as the determining factor in 
the poor test performance. To be sure, in many cases of low 
IQ, the examiner has little or no knowledge about the cir¬ 
cumstances or conditions which lead to the intellectual back¬ 
wardness. But such ignorance is obviously no more conducive 
to “valid” testing. Quite apart from the question of validity, 
the examiner should, of course, make every effort to under¬ 
stand why the individual performs as he docs on a test. The 
fullest possible knowledge of the individual’s pre- and post¬ 
natal environment, structural deficiencies, and any other rele¬ 
vant conditions in his reactional biography is desirable for the 
most effective use of the test data. But to explain why an in¬ 
dividual scores poorly on a test does not “explain away” the 
score. There are always reasons to account for an individual’s 
performance on a test. Language handicap is just as real as 
any other reason, 

A second distinguishing feature of our example is that such 
a language handicap is usually remediable. The individual need 
not be permanently backward in intellectual performance, but 
■with special training he may in large measure compensate 
for past losses in intellectual progress. Susceptibility to treat¬ 
ment is, however, a matter of degree. Many of the conditions 
determining intellectual performance, whether structural or 
functional, are amenable to change under special treatment. 
Moreover, conditions for which no effective therapy is now 
known may yield to newly developed treatments in the future. 
The distinction in terms of remediability is thus rather tenuous. 
Nor does such a distinction have any direct bearing upon the 
validity of a measuring instrument. A thermometer may be 
a valid index of fever, despite the fact that the administration 
of medicine will cure the fever. 

Thirdly, some may point out that language handicap is 
not hereditary and may maintain that for this reason its influ- 
enc upon test performance ought to be “ruled out.” Such 
an objection contains a tacit assumption that psychological 
tests are primarily concerned with those individual differences 
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in behavior which can be attributed to heredity. Since the 
number of hereditary conditions which have been clearly re¬ 
lated to behavior differences are extremely few, such a policy, 
if followed consistently, would mean the virtual cessation of 
psychological testing. Moreover, the connection between hered¬ 
itary mechanisms and behavior is so remote and indirect as 
to render the distinction between hereditary and environmental 
factors in behavior largely an academic one (cf, e.g,, 2.). Above 
all, it should be noted that no criterion against which any 
psychological test has been validated is itself traceable to purely 
hereditary factors. Hence no such test has been proved to be 
a valid measure of individual differences in hereditary charac¬ 
teristics. 

A fourth point to be considered is that of comparability. 
It may be objected that the individual who is handicapped 
by language difficulties, sensory deficiencies, or similar “ex¬ 
tenuating circumstances" is not comparable to the validation 
group on which the test norms were established. The require¬ 
ment of comparability in the application of psychological tests 
needs further clarification. If individuals arc entirely similar 
in all of the conditions (psychological, physiological, etc.) which 
influence the behavior measured by a particular test, individual 
differences will disappear, all subjects receiving the same score. 
Obviously no test is designed to measure behavior independ¬ 
ently of the conditions which determine such behavior—that 
would be a logical absurdity as well as an empirical impossi¬ 
bility. When the conditions in which the individual differs 
from the standardization group affect the test and the criterion 
in an approximately equal manner and degree, the validity of 
the test for that individual will not be appreciably influenced 
by the lack of comparability of the individual to the standard¬ 
ization group. 

This question of "comparability" pertains not so much to 
the measurement of behavior as to the analysis of the etiology 
of behavior differences. It is only when attributing the observed 
individual differences in test scores to a particular factor or 
class of factors that the investigator must make certain that 
other contributing factors have been reasonably constant. For 
example, if a few individuals in a group have a language 



71 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

handicap while the rest do not, wc could not ascribe individual 
differences in performance within this group to structural dif¬ 
ferences in the nervous system, or to any other factor whose 
contribution to behavior we may be investigating. The same 
limitation would apply, however, if educational opportunities, 
family traditions, incentives for intellectual activities, or any 
other factor were not held constant. The fact that the influence 
of language handicap, sensory deficiencies, and a few other 
conditions is more readily apparent does not place such con¬ 
ditions in a different category. The question of comparability 
applies equally to all conditions ocher than the one under 
investigation. 

A fifth consideration pertains to die use of test scores in 
prediction. Could an IQ obtained by a child with a language 
handicap serve as a basis for predicting the subsequent be¬ 
havior of the individual? As long as the language handicap 
remains, the test score can provide an accurate prognosis of 
the child's behavior in situations demanding the type of verbal 
responses sampled by the test. It is only in this sense that any 
psychological test makes predictions possible. Within a certain 
margin of error, behavior can be predicted under existing con¬ 
ditions. But if, for example, any detrimental conditions such 
as poor schooling, sensory deficiencies, nr the like are corrected, 
then performance on both test and criterion will show improve¬ 
ment. In discussions of test reliability, various writers during 
the past twenty-five years have pointed out that a psychological 
test should be expected to reflect changes in behavior at differ¬ 
ent times and under different conditions.' For test scores to 
remain constant when conditions affecting the subject's be¬ 
havior have altered would indicate a crude and relatively in¬ 
sensitive measuring instrument, rather than a highly “reliable” 
one. The same logic applies to validity. If the subjects' test 
scores remain unchanged despite the modification of conditions 
which affect criterion performance, the test cannot have high 
validity. 

Closely related to the problem of prediction is the scope 
or breadth of influence of any given condition upon the individ¬ 
ual’s behavior, bor example, the presence of a loud, irregular 


1 Cf -> e, 8'i i, +, S, 6, 9t i°, tl, in, ij, 18, 19. 
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noise during the testing would probably affect the score on 
that test, without influencing the individual's behavior in other 
situations, A toothache or a severe cold on the day of the 
testing would be further illustrations of narrowly limited con¬ 
ditions. In the case of these conditions, the prognostic value 
of the test for the individual would indeed be reduced, in 
much the same manner that holding an ice cube in the mouth 
would invalidate an oral thermometer reading of bodily tem¬ 
perature. Conditions such as language handicap, however, affect 
the individual’s behavior in a much broader area than that 
of the immediate test situation. They may thus influence both 
criterion and test score in a similar manner. 

The import of the above analysis is that validity should 
be consistently interpreted with reference to the specific criteria 
against which the given test was validated. It also follows that 
validity is not a function of the test but of the use to which 
the test is put. A test may have high validity for one criterion 
and low or negligible validity for another. The attitude that 
a good test has "high validity” and a poor test has "low 
validity” is still too prevalent among test users. Tests cannot 
be validated in the abstract, nor is the usual concept of validity 
itself universally applicable to psychological testing. It is only 
when tests are employed for predictive or diagnostic purposes 
that the correlation with an external criterion is relevant at 
all. In many investigations concerned with fundamental be¬ 
havior research, tests are employed merely as behavior samples 
obtained under standardized (i.e., uniform) conditions, without 
reference to the correlations of these samples with other," every¬ 
day-life” behavior samples (i.e., practical criterion measures), 
When the maze-learning behavior of white rats is tested, for 
example, the maze is not first "validated” against the rats’ 
success in finding food in a grocery basement, or their ability 
to avoid contact with prowling cats, or any other criteria of 
achievement in the rats’ extra-laboratory or workaday world. 
The investigator may quite reasonably argue that for the study 
of the particular principles of behavior which he is investigat¬ 
ing, maze-learning is as "good” a sample of behavior as cat¬ 
avoiding, and that he has no more reason for validating the 
former against the latter than vice versa. 
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Fundamentally, any validation procedure provides a measure 
of the relationship between two behavior samples. As Guilford 
has recently expressed it, “In a very general sense, a test is 
valid for anything with which it correlates" £7, p. 419). The 
process can be regarded as irreversible only when one of the 
behavior samples has greater importance than the other for 
a specific purposed In such a case, the more important behavior 
sample is designated the “criterion.No basic difference exists 
between “criteria” on the one hand and "tests” on the other. 
They are merely different samples of behavior whose inter¬ 
relationships permit predictions from one to the other. We 
could predict intelligence test scores from school achievement, 
although the process would he needlessly time-consuming. In 
such a case, the intelligence test scores would constitute the 
criterion. 

The criterion is not intrinsically superior in any sense. It 
is well known, for example, that many commonly used criteria, 
such as school grades or job advancement, may l*t influenced 
by many factors “extraneous” to the quality of the individual’s 
performance. Yet, if it is our object to predict such criteria, 
with all their irretevancics and shortcomings, then the correla¬ 
tion of a given test with such criteria is the validity of the 
test in that situation. Io be sure, the immediate criterion 
against which a test is validated may itself have been chosen 
as a convenient index or predictor of a broader anti less readily 
observable area of behavior. For example, a pilot aptitude 
test may be validated against performance in basic flight train- 
ing, the latter being in turn regarded as an approximate index 
of achievement in more advanced training and even possibly 
of ultimate combat performance. Such “successive validation” 
would be quite consistent with the relativity of predictors and 
criteria. It might be noted parenthetically that It is only when 
criterion measures are themselves used as predictors of further 
behavior that one may legitimately speak of the reliability 
and validity of the criterion itself (cf. e.g., 8). 


diction wilUnt vdationship between the (wo variable is curvilinear, pre- 
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Validation against a “practical” criterion is essential for 
many uses to which tests are put. It should not be assumed, 
however, that only tests which have been validated against 
some criterion considered important within a particular cul¬ 
tural setting can be used in behavior research. In order to be 
able to generalize from any obtained test score, we need only 
to know the relationships between the tested behavior in ques¬ 
tion and other behavior samples, none of these behavior samples 
necessarily occupying the preeminent position of a criterion. 
Thus, if the investigator is interested in the possible use of 
maze-learning performance as a basis for predicting the rats’ 
behavior in other learning situations, he will have to correlate 
the subjects’ maze-learning scores with their scores in a variety 
of other learning tasks. If a common factor is identified through 
these different learning scores, the “factorial validity” (7) of 
any one of the tests in predicting that which is common to 
all of them can be determined. On the other hand, if no single 
learning factor is demonstrated, then the area within which 
predictions can be made must be accordingly narrowed to 
fit the confines of whatever common factor docs become evident. 
Investigations conducted to date on human subjects, for ex¬ 
ample, have failed to indicate the presence of a common “learn¬ 
ing factor” (20, 21), and animal studies have revealed even 
greater specificity (cf., e.g., 14, r6, 17). But such specificity, 
if further corroborated, is an empirically observed fact whose 
discovery is useful in its own right in advancing our knowledge 
of behavior; it should not be construed as a weakness of the 
tests. 

Whether we are dealing with common factors and “factorial 
validity” or with "practical validity” in the prediction of every¬ 
day-life criteria, the question of validity concerns essentially 
the interrelationships of behavior samples. In the latter case, 
one sample is represented by the test and another, probably 
much more extensive sample, by the criterion. In the former 
case, the different tests which are correlated constitute the 
behavior samples. Nor should the terminology of factor analysis 
mislead us into the belief that anything external to the tested 
behavior has been identified. The discovery of a "factor” means 
simply that certain relationships exist between tested behavior 
samples. 
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The common misconception that the criterion is in some 
mysterious fashion more basic than the rest probably results, 
in part, from the belief that tests measure hypothetical "under¬ 
lying capacities" which are distinguishable from observed be¬ 
havior. Discussions of psychological tests often become hope¬ 
lessly entangled because of the implicit supposition that tests 
can be validated against such underlying capacities as criteria, 
Any operational analysis of actual validation procedures re¬ 
veals the futility and absurdity of such an expectation. 

In this connection wc may consider a monograph by Thomas 
(13), which sounds a note of acute pessimism regarding the 
use of mental tests as “instruments of science.” Through a 
careful and systematic logical analysis, the author demon¬ 
strates the fallacies inherent in any attempts to interpret psy¬ 
chological tests as measures of "innate abilities,'' hypostatized 
“fundamental human capacities,” and the like. He dearly re¬ 
cognizes that "the methodology of mental testing provides 
no way of operationally defining an ability and a performance 
as distinct... entities" (13, p. 75). But, in his final conclusions, 
the author seems to exhibit the same confusions which he had 
previously sought to eliminate.* For example, in the attempt 
to evaluate the scientific usefulness of psychological tests, he 
raises such questions as the following:“Do two identical scores 
mean that the same kind and amount of psychological processes 
were employed? Do they mean similar sociological backgrounds 
of experience? Do they mean a qualitatively similar adaptation 
to the immediate test environment? Do they mean that com¬ 
parable amounts of psychic tension were built up or that similar 
amounts of nervous energy were expended?" (13, p. 77). By 
way of reply he adds: “The achievement of such scientific 
meanings as these from the current methodology of mental 
testing is probably too much to expect, for test results at 
present are notoriously ambiguous in what they signify about 
the socio-psychological ingredients of the recorded perform¬ 
ances" (13, p, 77). 


These confusions in the fundamental argument do not detract from the value 
of certain more specific points discussed in this monograph, such as the limitations of 
ordinal scales, and thei concents of difficulty v.duc and horn.-sp. niiv in test construction. 
But these problems nave also been analyaul !>/ erher wnur., in a somewhat more 
constructive manner (cf,, e.g,, 3, to). 
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Two weaknesses are apparent in suck an argument. First, 
the testing of behavior is being confused with an analysis of 
the factors which determine behavior. Secondly, despite his 
earlier advocacy of an operational definition of “ability,” the 
author now appears to be chasing the will-o’-the-wisp of "psy¬ 
chological processes” which are distinct from performance. He 
seems thus to be demanding chat in order to be proper instru¬ 
ments of science, psychological tests should measure functions 
which by definition fall outside the domain of scientific inquiry [ 
In summary, it is urged that test scores be operationally 
defined in terms of empirically demonstrated behavior relation¬ 
ships. If a test has been validated against a practical criterion 
such as school performance, the scores on such a test should 
be consistently defined and treated as predictors of school 
performance rather than as measures of hypostatized and un- 
verifiable "abilities,” It is further pointed out that conditions 
which affect test scores may also affect the criterion, since both 
test scores and criteria are essentially behavior samples. The 
extent or breadth of such influences is a matter for empirical 
determination, rather than for a priori assumption. Moreover, 
the validity of a psychological test should not be confused 
with an analysis of the factors which determine the behavior 
under consideration. Finally, it should be noted that the dis¬ 
tinction between test and criterion is itself merely one of prac¬ 
tical convenience. The scientific use of tests is not predicated 
upon the assumption that criteria are a separate class of phe¬ 
nomena against which all tests must first be validated. Essen¬ 
tially, generalization and prediction in psychology require 
knowledge of the interrelationships of behavior, regardless of 
the situation in which such behavior was observed. 
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THE LOGIC OF SCALE CONSTRUCTION 1 

EDWARD A. SUCHMAN 
Cornell University 

Most of the classifications used in the course of our daily 
communication with one another are not defined with any great 
exactitude. For ordinary purposes of communication, it is 
usually not necessary to formulate a set of rules to distinguish 
between those things which belong to a certain class and those 
which do not, Agreement as to what constitutes membership 
in a class of objects is common enough to permit understanding 
without resort to explicit classification schemes. People can talk 
and write about "beautiful women / 1 "successful men/’ "good 
books" or "prosperous nations" without bothering to state the 
rules for their classifications. These "loose" classifications con¬ 
stitute an important part of our communicatory system. 

‘the Need for More Precise Classifications 

To the scientist, however, who must work with these classifi¬ 
cations, such loose usage often proves inadequate. Scientific 
communication demands a more rigorous statement of the bases 
for the classifications used. One of the tasks of the scientist 
becomes the translation of the loose descriptive terminology of 
ordinary social intercourse into the more precise classificatory 
systems of science. To the scientist the statements of Mr. Jones 
to Mr, Smith that, ff Mr. Brown is a successful lawyer,” or “Mr. 
Greene is an anti-Semitic person,” or "The United States is a 
prosperous country," present problems in definition. What is 
meant by "a successful lawyer," or "an anti-Semitic person," 
or "a prosperous country"? 

The need for such precise definition becomes apparent in 
ordinary communication when there is a disagreement between 
Mr. Jones and Mr. Smith. This disagreement illustrates the 

1 The author wishes to acknowledge the valuable contributions of Paul F. Lazarsfcld 
and Louis Guttman to the present formulation of the problem. 
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problem of communicatory classification which the scientist is 
attempting to solve. Two persons who disagree on how to 
classify a third person or object find themselves faced with the 
difficult problem of defining the bases for their classification. 
To reach an agreement they are forced to tighten the loose 
classificatory system which usually suffices when both are in 
agreement. So long as both individuals agree that “Mr. Brown 
is a successful lawyer,” they will feel little need to define what 
they mean by “successful.” However, when a disagreement 
occurs, they are forced to state more precisely what they mean 
by “successful” or “unsuccessful.” This transition from a loose 
classification to a more rigorous classification constitutes one of 
the most important tasks of the social sciences. How can this 
transition be accomplished? 

The Problem qf Seale CoHSlructkm 

The efforts of social scientists to define the meaning of some 
attribute or variable in such a way as to permit the classifica¬ 
tion of persons or objects according to the degree to which that 
attribute is present or absent constitutes the problem of scale 
construction. As stated by Lundberg, there are two principal 
aspects to this problem, “ (i) How shall we select the aspects or 
factors of a unit which we deem significant and which are 
therefore to be considered in our scale? (2) How shall we de¬ 
termine the relative weight to attach to each factor included?"* 
These problems of item selection and item weights occupy a 
central position in most current methods of scale construction. 

However, we propose to show that in the case of a uni-dimen¬ 
sional scale these two problems are actually non-existent. The 
theory of “scalability” to be developed is based upon the funda¬ 
mental concept that if an area is uni-dimensional, then (1) any 
series of items selected from that area is interchangeable with 
any other series of items, and (a) any set of weights given to a 
series of items will produce the same rank order of objects or 
individuals as any other set of weights. The problem of scale 
construction, therefore, takes the form of a test for uni-dimen- 

8 Lundberg, George. Social Research, New York: Longmans, Green & Co., 1941. 
p. 459. 
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sionality, rather than the arbitrary treatment of non-scalable 
data as if it were scalable. 

First, we will deal with the problem of item selection and, 
second, with the problem of item weights. 

" Non-itemized" versus " Itemized ” Classifications 

F' An attempt to clear up the disagreement between Mr. Jones 
and Mr. Smith discussed above may take two different lines of 
development: (i) The introduction of additional judgments 
from other persons, or (2) the listing of those items which serve 
to characterize the different classes. The first approach, that of 
"non-itemized” judgments or ratings, represents an attempt to 
reach an agreement based upon the opinions of other judges, 
without attempting to characterize or describe further the basis 
for the judges’ ratings. The second approach, that of "itemized” 
classification, requires the listing of a characterizing aggregate 
of items which serves as the basis for the classification to be 
made. 

Let us see how these two approaches would apply to the 
present problem. As an example of the first approach, Mr. 
Jones and Mr. Smith could attempt to settle their disagreement 
as to whether Mr, Brown is a "successful” lawyer by asking a 
group of other people to classify Mr. Brown as "successful” or 
"unsuccessful.” The basis for agreement using this method 
might be the proportion of judges rating Mr. Brown as "success¬ 
ful” or "unsuccessful.” This form of classification we shall call a 
“non-itemized” classification. 

As a second approach, Mr. Jones and Mr. Smith could 
attempt to settle their disagreement by asking each other 
exactly what they mean by "successful” or “unsuccessful.” 
They would probably reply by pointing out certain character¬ 
istics of Mr. Brown which to each of them signify the presence 
or absence of "success.” The classification of "successful” is 
expanded by the introduction of such items as "He has money,” 
or "People listen to what he has to say,” or "He has written 
many books,” and other classificatory items characteristic of 
"success.” As more and more specific items are added to the 
general classification, the loose definition takes on a more pre- 
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cise meaning. Specific actions or characteristics of Mr. Brown 
are mentioned which afford the basis for the development of 
classificatory techniques by means of which “successful" people 
can be distinguished from “unsuccessful'" people. The basis for 
agreement using this method might He in the number of char¬ 
acteristics indicative of success which Mr. Brown possesses. 
This form of classification wc shall call an “itemized" 
classification. 

Thus, the need for a more precise classification, we have seen, 
can lead to the use of “non-itemized" judgments or to the use of 
‘‘itemized 1 ' aggregates of characterizing attributes. Both meth¬ 
ods are currently being used by social scientists in their attempts 
to classify data. Each method has its own particular set of 
problems. The use of "nan-itemized" judgments presents a 
solution to the problem based upon ratings without any attempt 
to produce a definition of the variable. The use of “itemized 
aggregates of attributes, on the other hand, attempts a solution 
to the problem based upon a meaningful definition of the 
variable. It is this latter method which will constitute the main 
focus of the present attempt to arrive at a logical basis for scale 
construction. 

Let us look at the first question, “How shall we select the 
aspects or factors to be considered in our scale?" 

The Concept of an “Itemized" eiggregate 

An aggregate of items consists of a series of items which have 
been selected as characterizing some object or person. These 
characterizing items, as we shall see, form the basis for a 
rigorous system of measurement. The transition from a loose to 
a more precise classification, which is the task of the scientist, 
is accomplished through the organization of these characterizing 
items into coherent systems. 

The number of characterizing items that exist for any single 
variable is unlimited. Furthermore, there appears to be little 
inherent reason why any one item is better than any other. 
“Success” may be defined in any number of different ways. 
Theoretically there are an infinite number of classificatory items 
which may be used to distinguish a "successful” from an “un¬ 
successful” person, no single one of which is inherently better 



LOGIC OF SCALE CONSTRUCTION 


83 


than any other. How can such an infinitely broad range of 
characterizing items be brought into the reach of the scientist 
who desires to study them? 

This concept of a universe of items can be illustrated by 
examples from many different types of social phenomena. The 
construction of an index of purchasing power may include 
almost any sampling of characterizing items which come from 
the total universe of items characteristic of purchasing power. 
The classification of individuals according to social status may 
include a large group of characterizing items ranging from in¬ 
come to the number of books read. The judgment of individuals 
according to their ability to supervise men may include such 
diverse items as the amount of time spent talking to the men 
and the score received on an intelligence test The intelligence 
test itself is composed of a wide range of items. The ranking of 
people according to their attitude toward some issue is based 
upon their responses to a series of attitude items. All of the 
above areas are characterized by the use of a wide range of 
items in an attempt to arrive at a more precise classification of 
these areas. Another way of stating this would be to say that 
an attempt is made to classify social phenomena by observing a 
number of items which come from a universe of items char¬ 
acteristic of these phenomena. 

Sampling a Universe of Items 

We now come to an important aspect of this concept of a 
universe of items—the sampling of items from this universe. 
Since an unlimited number of items can be used to characterize 
a single concept, any definite number of items that are used 
must be a sample from this unlimited universe. Any single item 
that is used in practice is but a sample of one from this universe, 
and is interchangeable with any other item from the universe 
that might have been used in its place. The items used in a 
scale of attitudes toward war, an intelligence test, a social-status 
scale, a rating sheet on efficiency of workers, a standard of 
living index, a personality inventory, or in any classification 
device in the social sciences are only a selection from an in¬ 
finitely large number of similar items. Thus the practical prob¬ 
lem of classification in the social sciences becomes one of study- 
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ing a universe on the basis of a sampling of items from that 
universe. 

The concept of an aggregate of characterizing items, thus, 
conceives of a sample from an unlimited number of items which 
may be used to characterize any social phenomenon. The char¬ 
acterizing universe consists of all items which can be, used to 
exemplify the social concept. The determination of whether or 
not an item belongs to a certain universe, however, remains a 
matter which must be decided upon by common agreement. 
A characterizing item belongs to a universe on the basis of some 
arbitrary decision as to its content. The universe itself is de¬ 
cided upon arbitrarily as the content of interest to the investi¬ 
gator. Some additional means, such as the consensus of judges, 
might be introduced to help the investigator, but the final 
decision of whether or not this item characterizes the universe 
or phenomenon of interest, must be a subjective one. 

As will be discussed in the next section, a test of scalability 
can help one to eliminate certain obvious cases of misinterpre¬ 
tation of the meaning of an item. But such ex posi/ae/o ration¬ 
alizations are to be rigorously avoided. If the decision is made 
that this particular series of items represents the universe of 
interest, then eliminating items must result in a redefinition of 
one’s interests. Whether or not an item belongs to the universe 
must not be a decision based upon some “correlational" test— 
there must be an adequate “content" interpretation for both 
acceptance and rejection. 

Our answer to the problem of which factors to consider in a 
scale, therefore, is that one must first define the universe in 
which one is interested, This definition of the universe is a sub¬ 
jective one and consists of the listing of characterizing aggre¬ 
gates of items. The actual series of items that one uses in 
practice can be conceived of as a sample of items from the un¬ 
limited number that exists in the universe of content. The prob¬ 
lem now becomes one of determining how valid a representation 
of the total universe the selected sample is. The answer to this 
problem depends upon the determination of the dimensionality 
of the universe, Does the universe consist of a single dimension? 
To answer this question, we turn next to a consideration of 
“dimensionality.” 
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The Concept of a " Uni-dimensional” Aggregate of Items 1 

Let us assume that we now have a tentative set of char¬ 
acterizing items to be used for the classification of some social 
phenomenon. What are the different patterns of inter-relation¬ 
ships which these items can assume and of what importance are 
these patterns to the problem of scale construction? 

As an example of what might occur in the way of inter¬ 
relationships, let us start out with a simple case of three items 
only. Suppose, for example, in the previous problem of classify¬ 
ing individuals according to how successful they are as lawyers, 
we had decided to use the following three items: 

1. Did he have an income of over $2.5,000 a year? 

2. Was he the author of any books on law? 

3. Had he ever received any honors from the bar association? 

Suppose further that each item had been answered either 

“yes” or “no.” 

Conceivably then we might have the following eight types 
occurring among the lawyers whom we are interested in 
classifying: 


Item 

Item 

Item 

1 

2 

3 

(Money) 

(Books) 

(nonora) 

Yes 

Yes 

Yes 

Yes 

Yes 

No 

Yes 

No 

Yes 

No 

Yes 

Yes 

No 

No 

Yes 

No 

Yes 

No 

Yes 

No 

No 

No 

No 

No 


We are now faced with the problem of ordering the above 
eight types according to how successful each type is as a lawyer. 
Types 1 and 8 give us no trouble; type 1, possessing all three of 
thecharacterizing items of success, is most successful; and type 8, 
possessing none of the characterizing items, is least successful. 
However, we find that types 2, 3 and 4 each possess two of the 
characterizing items of success. How are we to rank these 
three types relative to each other ? Should we give least weight 
to “honors,” and rank type a above types 3 and 4, or should we 

’This concept of a "uni-dimensional” universe has been derived from the theory 
of scaling developed by Louis Guttman. See "A Basis for the Scaling of Qualitative 
Data,” American Sociological Review, IX (1944), 139-150. 
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give least weight to “books,” and thus rank type 3 above 
types 2 and 4? The same problem of weighting applies to types 
5, 6 and 7 each of which possesses one of the characterizing 
items. We are faced with the need to make some decision as to 
how much weight to assign to each of the characterizing items. 
We shall therefore call any aggregate of items with the above 
pattern of inter-relationship, aggregates which present a prob¬ 
lem of relathe weights. Rank order for such a pattern cannot be 
determined without assigning weights to the different items. 
Furthermore, depending upon the relative weights assigned, 
this rank order can vary with different sets of weights. This we 
recognize as the second problem of scale construction -how 
much weight to give to each item. 

We now come to an important question, “Are there any 
aggregates of items which do not present a problem of relative 
weights?” It is to be expected that an affirmative answer to this 
question would depend upon our ability to find an aggregate of 
items which formed a rather special pattern of inter¬ 
relationships. 

Let us illustrate one such pattern by means of the previous 
example. Suppose we found that the relationship between the 
three characterizing items was such that only four out of the 
eight possible types actually occurred. There would be (a) the 
type that possessed all three characteristics, (b) the type that 
possessed characteristics 2 and 3 only, (c) the type that pos¬ 
sessed characteristic 3 only, and finally (d) the type that 
possessed none of the characteristics. In other words, only types 
x, 4, 5 and 8, as listed above, would be found to occur in 
actuality. 

Let us repeat this listing of types including only the above 
four types. 


Type 

(Money) 

(Books) 

(II«KW1») 

I 

Yea 

Yes 

Ym 

4 

No 

Yea 

Yd 

$ 

No 

No 

Ye* 

8 

No 

No 

No 


Under what conditions could we expect the occurrence of 
only the above four types? The answer to this question is found 
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in the pattern of inter-relationship between the items. First, we 
find that the types can be ordered, depending upon the num¬ 
ber of characteristics each type possesses. No two types have the 
same number of characteristics. Second, we find that the items or 
characteristics can be ordered, depending upon the number of 
types that possess that characteristic. No two items are possessed 
by the same number of types. 

The result of this ordering process of characteristics and 
types produces a definite pattern of inter-relationship. This 
pattern can be easily recognized if vve separate the possession 
of a characteristic from its absence, and then order both char¬ 
acteristics and types according to frequency of occurrence. 
The result of such an ordering process is a parallelogram. 

This pattern could be represented as follows: 


U09 

Type Money 

1 X 

4 

5 
8 


Wrote 

Book* 


X 


X 


Received 

Honor* 

X 

X 

X 


Docs Not 
Have 
Money 



Did Not 
Write 


Books 


X 

X 


Hsu Not 
Received 
Honor* 


X 


where an X represents the characteristics of each type. A 
parallelogram pattern such as the above offers no problem in 
weights, No matter what weights were given to each of the 
items, the rank order of Types I, 4, 5, and 8 would be the same, 
because each type possesses all of the characteristics of the 
type below it, and one more in addition. The rationale for such 
a pattern will become clearer after the following discussion. 

Another method of deriving this special pattern of relation¬ 
ship between characterizing items which do not present a prob¬ 
lem of weights would be by means of cross-tabulation. What 
form must a cross-tabulation between two items take in order 
for a rank order based upon these items to be independent of 
any weights the items might be assigned? One form, of course, 
would occur if these two items were perfectly correlated. There 
would be only two types of individuals in such a case—those 
with both characteristics and those with neither characteristic. 

This perfect correlation may be represented by a fourfold 
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table as follows: 


hem 2 


hem i 

+ 

+ j N 

— t> 






4 


where + indicates the presence of that characteristic and - in¬ 
dicates its absence. On the basis of this type of relationship 
between items, all of the individuals in the f + cell may be 
ranked above those in the - — cell. No matter what weights 
are given to the items, the rank order will remain the same. 

A second possibility is that individuals may fall into three of 
the cells of such a fourfold table, as follows: 

hem I 

i + i 

hem 2 + ' N : N 

~ i T'H ~n 



Here again there is no problem of relative weights to be assigned 
the two items. Those individuals in the + + cell would receive 
the highest rank order, those individuals in the — — cell would 
receive the lowest rank order, while the only other group in the 
+ — cell would fall in between the highest anti the lowest 
ranks. Again no matter what weights were given to the items, 
the rank order would remain the same. 

Finally, a third possibility is that individuals would fall into 
all four cells of the table, as follows: 

Item i 

_|_+J 

Item 2 + j N I 


While there is no problem of ranking in relation to the + + cell 

and the-cell, we find that how the individuals in the other 

two cells were ranked would be completely dependent upon the 
relative weights given to the two items. Here, then, we have the 
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problem of the relative importance of items or the problem of 
•weighting in scale construction, 

A cross-tabulation between any two dichotomous items in a 
series, therefore, must have the following characteristic in order 
for rank order to be independent of item weights; all cross¬ 
tabulations between the items should result in the absence of 
any cases in one of the cells which represents a “positive” 
answer on one item and a “negative” answer on another item. 
This zero-cell, furthermore, must occur in the column which 
contains the lowest positive frequency. For example, a cross¬ 
tabulation between items a and 3 of the previous example would 
have to look as follows: 

Item 2 (Books) 




Yes 

No 

Item s 
(Honors) 

Yes 

N 

N 


No 

0 

N 



There should be no individuals who have written books, but 
who have not received any honors. Any characterizing item 
that is the property of a lower rank must also be the property of 
all higher ranks, while the lower rank must lack the distinguish¬ 
ing characterizing item of the upper rank. Thus, since “honors” 
is a characteristic of Type 5, it must also be a characteristic of 
Type 4 (a higher rank), but Type 5 in turn must lack the dis¬ 
tinguishing characteristic of the higher rank, in this case 
“books.” 

The parallelogram pattern which permits the determination 
of a rank order without presenting the need for assigning arbi¬ 
trary weights to the various items will be called a uni-dimen¬ 
sional pattern. Such a uni-dimensional pattern can be de¬ 
termined empirically, first by ordering items according to 
ascending order of positive frequencies, i.e., “money” is a char¬ 
acteristic of fewest lawyers, and is therefore placed before 
“books” which in turn precedes “honors,” and then by ordering 
individuals according to the number of characterizing items 
they possess If, as a result of this ordering of items and indi¬ 
viduals, the aggregate of items with which one is dealing forms 
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a parallelogram pattern then we can proceed to classify indi¬ 
viduals according to a rank order which is independent of any 
weights which the items might he given. Such a rank order has 
the property of permitting one to derive from the rank order 
the exact characteristics of the individuals in that rank—since 
there is only one possible combination of items for any single 
rank order. Furthermore, the rank order has the quality that 
any individuals in a higher rank possess all the characteristics of 
the individuals in a lower rank, anti at least one more in addi¬ 
tion. This property of reproducibility of characteristics from a 
knowledgeof rank order can only be present where the aggregate 
of characterizing items does not present a problem of relative 
weighting. It permits a more clear-cut rationale for ranking 
individuals along a single continuum than is jvnssible when the 
rank order must be based upon an arbitrary decision of how 
much weight to assign each item. 4 

The aggregate of items which permit such a rank order which 
is independent of item weights will be called a 1 scale" and the 
universe of which the items arc a sample will be called a scalable 
universe. Since the universe is scalable, any selection of items 
from that universe would result in the same rank order of ob¬ 
jects or persons as any other selection. A scale in the present 
usage is therefore an aggregate of items which are so inter¬ 
related as to offer no problem of relative weighting.® 

A test of ‘'single meaning" 

To a limited extent, scale analysis can be used as a test of the 
"meaning" of items in an effort to eliminate items which do not 
belong to the scalable universe. However, there must be an 
adequate “content" reason in addition to the “correlational" 
analysis. In many cases, the correct decision would be to label 
one’s universe of interest as multi-dimensional, and therefore 

* Simply techniques far testing n scries of items for unidimendonoJiiy baaed upon 
the determination of whether or not a parallelogram pattern exist* have l*ccti de¬ 
veloped. See, far example, Guttmtm, L„ "The Cornell Teclinkpie forSe#lc and Intensity 
Analysis, Educationai, and Psyciioukiicai. Mr.A«UAt:Mtmr, VII (i 947 h 

It is important to remember that many universes will be found to present a 
problem of weighting constituent items and that much work remains tt> Ins done In 
solving the problem of classification for such areas. Sealing is not a solution to the 
problem of weighting, but rather a selection of areas whicTi do not present a prob¬ 
lem or weighting. 
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not scalable, rather than to attempt to tease out a scalable sub¬ 
group of items which no longer reflects the desired universe. 

Let us illustrate this problem of "meaning” by means of an 
example. Suppose, in the previous example of the classification 
of lawyers according to "success,” the item, "Does he have an 
income of over $25,000 a year?” had been asked, instead, as, 
“Does he have an income of over $25,000 a year which he has 
earned honestly?" Whereas an answer of “No” to the former 
wording of the question has a clear-cut single meaning, the 
answer of "No” to the latter wording may mean either that he 
does not have a high income or that he has a high income but, 
in the opinion of the respondent, he has not earned it honestly. 
The response to this latter question depends upon the aspect or 
element upon which the subject focuses. The question can have 
more than one interpretation. Such questions have been called 
"double-barrelled,” and their use for classification purposes is 
limited by the fact that different subjects may be responding to 
different aspects of the question. 

While the presence of double-meaning is relatively easy to 
determine in the case of a single question, there is another type 
of double-meaning which is not so easily detected. As was 
discussed in the first section, the study of social phenomena 
involves the sampling of items from a whole universe of items 
characteristic of those phenomena. This use of an aggregate of 
items permits the occurence of a new type of double-meaning—• 
different meanings for the different items in the aggregate. This 
problem is quite different from that of double-meaning in the 
single question, as can be illustrated by the following example. 

Suppose, in the selection of items characterizing a successful 
lawyer, we had carefully avoided any single items with possible 
double-meanings. But we now add a fourth item, "Does he 
have children?” We now have the following list of questions: 

1. Does he have an income of over $25,000 a year?” 

2. Has he written books? 

3. Has he received any honors? 

4. Does he have children? 

Let us assume that there are no double-meanings in any single 
one of the above questions. However, a new problem arises. 
This problem may be stated as, "Do all of the above questions 
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deal with the same topic?” This problem is different from the 
previous one stated as, “Docs this question call for a response 
dealing with a single topic?" The new problem is one of de¬ 
termining the single meaning of a series of questions, each of 
which has been judged individually to contain only a single 
meaning. In other words, we must now determine whether or 
not the single topic studied by each of the items is the same 
single topic for ail of the items. The problem of meaning for a 
single question is, “Does the individual question produce a 
response to only a single topic?’’, while the problem of meaning 
for an aggregate of questions is, “Is the single topic studied by 
each of the questions the same for each question?"' 4 

The proposed parallelogram test for uni-dimensionality would 
serve to indicate in a series of scalable items whether or not any 
of the items did not deal with the same dimension indicated 
by a large majority of the items. Such double-barrelled items 
as “Docs he have an income of over $25,«v<> a year which he 
has earned honestly?” or such extra-dimensional items as “ Does 
he have children?" would not conform to the parallelogram 
pattern. 

Summary 

The problem of scale construction has often been stated as 
involving (1) the problem of item selection and (2) the problem 
of item weights. The present paper offers a logical system for 
scale construction which answers these two problems in terms 
of a test for uni-dimensionality. Any series of items used in a 
scale can be conceived of as a sample of items from an unlimited 
universe of items dealing with the variable being studied. If a 
test of the inter-relationships of these items shows them to con¬ 
form to a defined parallelogram pattern, then the rank order of 
objects or individuals based upon these items will be independ¬ 
ent of item weights. Furthermore, in such a case, the rank order 
will pertain to the entire universe of items and any selection of 
items from that universe will produce the same rank order as 
any other selection. 

, 4 This question of meaning, of course, could be seated the tame for both single 
items and aggregates of items as follows, n Ia a single topic only K’tnfr **njdiod* ,, Tne 
present formulation, however, is important for an undei jurnJirip oi tin* m< ihnh uied 
to answer this question for a series of items. 
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Thus, according to this approach, the problem of scale con¬ 
struction becomes a problem of testing a series of items for uni¬ 
dimensionality. If the items conform to the prescribed scale 
pattern, then the problems of item selection and item weights 
are non-existent. This approach therefore involves a test for 
scalability in the area of interest, rather than the construction 
of some arbitrary scoring scheme. In this sense, scales can only 
be constructed for uni-dimensional variables. If the underlying 
variable is shown to be uni-dimensional, the rank order of 
objects or persons is independent of item selection and item 
weighting. If the underlying variable is shown to be multi¬ 
dimensional, then a meaningful single rank order is impossible. 

It is the task of the research worker, therefore, first, to define 
his area of interest by listing those items which characterize the 
universe in which he is interested, and, second, to test these 
items for uni-dimensionality. If the test shows that the universe 
is not uni-dimensional, then he cannot construct a meaningful 
scale by arbitrary decisions of item selection and item weight¬ 
ing. If the test shows that the universe is uni-dimensional, 
then the problems of item selection and item weights are non¬ 
existent. 



VALIDITY, RELIABILITY, AND BALONEY 1 


EDWARD E. CTRETON 
University ofTcniinofir 

It is a generally accepted principle that if a test has demon* 
strated validity for some given purpose, considerations of relia¬ 
bility are secondary. The statistical literature also informs us 
that a validity coefficient cannot exceed the square root of the 
reliability coefficient of either the predictor or the criterion. This 
paper describes the construction and validation of a new test 
which seems to call in question these accepted principles. Since 
the technique of validation is the crucial point, I shall discuss 
the validation procedures before describing the test in detail. 

Briefly, the test uses a new type of projective technique which 
appears to reveal controllable variations in psychnkinetic force 
as applied in certain particular situations. In the present study 
the criterion is college scholarship, as given by the usual grade- 
point average. The subjects were 29 senior and graduate stu¬ 
dents in a course in Psychological Measurements. These stu¬ 
dents took Forms Q and R of the Cooperative Vocabulary Test, 
Form R being administered about two weeks after Form Q, 
The correlation between grade-point average and the combined 
score on both forms of this test was .23. The reliability of the 
test, estimated by the Spearman-Brown formula from the corre¬ 
lation between the two forms, was .90. 

The experimental form of the new test, which I have termed 
the “B-Projective Psychokinesis Test,” or Test B, was also 
applied to the group. This experimental form contained 85 
items, and there was a reaction to every item for every student 
The items called for unequivocal “plus" or “minus" reactions, 
but in advance of data there is no way to tell which reaction to 
a given item may be valid for any particular purpose. In this 

1 This paper was presented in Denver, Colorado, September 7, ifipj, at a meeting 
aponsored Jointly by the Division on Evaluation and Measurement of the American 
Psychological Association and the Psychometric Society. 
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respect Test B is much like many well-known interest and per¬ 
sonality inventories. Since there were no intermediate reac¬ 
tions, all scoring was based on the “plus” reactions alone. 

I first obtained the mean grade-point average of all the stu¬ 
dents whose reaction to each item was “plus.” Instead of using 
the usual technique of biserial correlation, however, I used an 
item-validity index based on the significance of the difference 
between the mean grade-point average of the whole group, and 
the mean grade-point average of those who gave the “plus” 
reaction to any particular item. This is a straightforward case 
of sampling from a finite universe. The mean and standard 
deviation of the grade-point averages of the entire group of 
29 are the known parameters. The null hypothesis to be tested 
is the hypothesis that the subgroup giving the “plus” reaction 
to any item is a random sample from this population. The mean 
number giving the “plus” reaction to any item was 14.6 I 
therefore computed the standard error of the mean for independ¬ 
ent samples of 14.6 drawn from a universe of 29, with replace¬ 
ment. If the mean grade-point average of those giving the 
“plus” reaction to any particular item was more than one stand¬ 
ard error above the mean of the whole 69, the item was retained 
with a scoring weight of plus one. If it was more than one stand¬ 
ard error below this general mean, the item was retained with a 
scoring weight of minus one . 

By this procedure, 9 positively weighted items and 15 nega¬ 
tively weighted items were obtained. A scoring key for all 44 
selected items was prepared, and the “plus” reactions for the 
29 students were scored with this key. The correlations between 
the 29 scores on the revised Test B and the grade-point aver¬ 
ages was found to be .82, In comparison with the Vocabulary 
Test, which correlated only .23 with the same criterion, Test B 
appears to possess considerable promise as a predictor of college 
scholarship. However, the authors of many interest and per¬ 
sonality tests, who have used essentially similar validation 
techniques, have warned us to interpret high validity coef¬ 
ficients with caution when they are derived from the same data 
used in making the item analysis. 

The correlation between Test B and the Vocabulary Test 
was .31, which is .08 higher than the correlation between the 
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Vocabulary Test and the grade-point averages. On the other 
hand, the reliability of Test B, by the Kudcr-Kichardson For¬ 
mula ao, was -.06. Hence it would appear that the accepted 
principles previously mentioned are called in question rather 
severely by the findings of this study. The difficulty may be 
explained, however, by a consideration of the structure of the 
B — Projective Psychokinesis Test. 

The items of Test B consisted of 85 metal-rimmed labelling 
tags. Each tag bore an item number, from 1 to 85, cm one side 
only. To derive a score for any given student, 1 first put the 85 
tags in a cocktail shaker and shook them up thoroughly. Then 
I looked at the student's grade-point average. If it was B or 
above, I projected into the cocktail shaker a wish that the stu¬ 
dent should receive a high ‘'plus" reaction score. If his grade- 
point average was below B, I projected a wish that he should re¬ 
ceive a low score. Then I threw the tags on the table. To obtain 
the student’s score, I counted as "plus" reactions all the tags 
which lit with the numbered side up. The derivation of the term 
"B — Projective Psychokinesis Test" should now be obvious. 

The moral of this story, 1 think, is clear. When a validity 
coefficient is computed fiom the same data used in making an 
item analysis, this coefficient cannot be interpreted uncritically. 
And, contrary to many statements in the literature, it cannot be 
interpreted "with caution” either. There is one dear interpre¬ 
tation for all such validity coefficients. This interpretation is— 


"Baloney I” 



RESPONSE SETS: A NOTE ON CONSISTENCY 
IN TAKING EXTREME POSITIONS 

EDWARD A. RUNDQUI5T 
Owens-Illinois Glass Company 

"A response set is. .. any tendency causing a person to 
give different responses to test items than he would when the 
same content was presented in different form,” Thus Cronbach 
(x) defines response sets in a recent summary of the wide range 
of situations in which such sets have been found. 

In personality testing, response sets can be deliberate at¬ 
tempts to deceive, reflections of basic drives or traits, reflec¬ 
tions of a particular frame of reference, or a temporary set 
brought about by a particular way of interpreting the direc¬ 
tions. On just what a response set reflects and how consistently 
it reflects it, will depend the importance that is attributed to it. 
If a response set is transient and dependent primarily on the 
given conditions of an immediate situation, interest will be 
confined to controlling its influence so it does not interfere 
with the interpretation of test results. If, however, it influences 
behavior in a variety of situations over a long period of time, 
it would be worthy of careful study as a means of personality 
measurement. 

Among other response sets noted by Cronbach, is the tend¬ 
ency to take the extreme positions on scales of the Like— 
Indifferent-—Dislike or the Agree — Undecided—Disagree type. 
This note reports on the consistency of this tendency in two 
situations, one immediately following the other. In the first, 
hi factory girls, all doing the same work, describe themselves 
by indicating how well each of 200 descriptive words and 
phrases apply to them; in the second, how well they liked or 
disliked each of 100 activities. 

As Cronbach notes, to measure the consistency of any re¬ 
sponse set, the situations involved must allow equal oppor¬ 
tunity for it to be called out, i.e., the situations must be equally 
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indefinite or unstructured- Whether the personality and in¬ 
terest items with their respective directions provide this may 
be judged from the material appended to this note. To the 
writer there seems at least approximately equal opportunity 
for the set to take extremes to operate. The fact that the two 
forms were presented in immediate succession, the personality 
form first, would increase the likelihood for the same set to 
operate while taking both forms. 

Substantial individual differences exist in the tendency to 
take the extreme position. The mean and sigma for the aoo 
personality items arc 74.95 and 37.18; for the 100 interest items, 
37.47 and 14.14. To obtain these scores, the number of A and 
E responses (see key at end of paper) for each series were 
summed. 

The correlation between this tendency on the two series of 
items is .40. This is significantly different from zero. (Sigma of 
an r of zero with an N of 1 n is .1.) 

There is, then, a real tendency for those who take extreme 
positions in describing their traits to take extreme positions 
in describing their interest. On the basis of a consistency repre¬ 
sented by a correlation of .4 in two similar and immediately 
successive situations, it is hard to believe this particular re¬ 
sponse set is reflecting anything basic about the individual. 
It seems rather that it is largely a function of the type of ma¬ 
terial, interpretation of directions, mood, or some other tem¬ 
porary condition. At least with a consistency of ,4, we would 
not expect a measure to be very useful in predicting a criterion 
such as behavior on a job. With personality and interest items 
of the kind dealt with here, it would seem more profitable to 
eliminate the operation of this response set rather than to at¬ 
tempt to use it as a measure, 

Directions for Personality Items 

On the following pages are words and phrases used in de¬ 
scribing people, You are to describe yourself by indicating how 
well each description applies to you. Use the following key: 

Key: A. Describes me perfectly or almost perfectly. 

B. Describes me unusually well, 

C. Describes me fairly well. 

D. Describes me some but not very well. 

E. Describes me slightly or not at all, 
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Indicate your answer by putting the letter that applies on the 
line in front of each descnption. Suppose the word is “helpful.” 

If you feel it describes you fairly well, you would place a C 
on the line before it, thus: 

C Helpful 

If you feel that this word describes you some but not 
very well, you would place a D on the line thus: 

D Helpful 

If you feel that the word describes you perfectly or almost 
perfectly, you would put an A on the line, thus: 

A Helpful 

Look at each word or phrase and decide how well it describes 
you. Do not worry about being consistent but consider each 
description by itself. Do not skip any. 

i. Cheerful _ 24. Have high ideals 

4. Know my own mind 28. Stubborn 

8. Cooperative (like to help 41. Like to be different 
people) 44. Jealous 

13. Restless (never still a 46. Always on time 

minute) 

14. Worry about the future 

Directions for Interest Items 

On the following page is a list of activities. Indicate how 
much you like or dislike each one. Use this key in indicating 
how much you like or dislike it. 

Key: A. Like a great deal 

B. Like some 

C. Neither like nor dislike 

D. Dislike some 

E. Dislike a great deal 

You may not have done all the activities listed. Further, 
some require training which you may not have had. For these 
indicate how much you think you would like them if you 
tried them and if you had the proper training. Answer every 
item. Give your first reactions. Work rapidly. 

3. Work around machinery xg. Trying out new cooking 

4. Arrange flowers recipes 

8. Teach English 28. Read a book 

10. Soft and slow music 30, Tidy up the house 

16. Visit a canning factory 38, Look up words in a 

18, Go to parties often dictionary 
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1. Cronbach, L. S. "Response Sets and Test Validity.” Educa¬ 
tional and Psychological Measurement, VI (1946), 
475 - 494 - 



THE INTERESTS OE ART STUDENTS 


WALTER R. BORG 
University of Testa's 

Introduction 

The aim of this study is to attempt to answer the following 
questions: 

i. Does the Kuder Preference Record differentiate an art 
group from general-population samples with respect to 
art-interest scores? 

a. Is success in art courses significantly related to areas of 
interests as measured by the Ruder Preference Record ? 

3, Do Kuder profiles of groups of students specializing in 
different areas of art differ significantly? 

Preliminary. Study 

The Strong Vocational Interest Blank was used by the in¬ 
vestigator in a preliminary study of 85 upper division art-col¬ 
lege students and was not found to be useful in differentiating 
levels of artistic ability or revealing individual art interests, al¬ 
though, as a group, the art students studied were above the 
norms for non-art groups. Only 38 per cent of the art group 
studied in the preliminary investigation received "A” ratings in 
art interest although this group was made up entirely of ad¬ 
vanced art students. Seventy-two per cent of Strong's criterion 
group, consisting of 124 painters, 79 commercial artists, ao 
sculptors, and 9 cartoonists, made "A” art ratings on the Voca¬ 
tional Interest Blank. 1 The mean score for this group is given as 
176,80 with a standard deviation of 88.08 1 . The mean for the 
group tested in the preliminary study conducted by the author 
was 86 with a sigma of 100. Great differences between the 
general makeup of Strong’s criterion group and the art-college 
students tested possibly account for the differences in score. For 

1 Strong, E. K. Vocational Interna of Men and Women. Stanford Univ.; Stanford 
University Press, J943. Page 730. 

s Taken from norms supplied with artist scoring key for Strong test. 
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example, the average age of the criterion group is given by 
Stiong as 427 years, with average education ti 9 grade, in¬ 
dicating a much more mature and somewhat less-educated 
group than the advanced art students tested. 3 Because of the 
above findings, it was decided to use tile Kudt'r Preference 
Record, harm IIP. in the piesent study 

Pi esent Study 

A total of 4-7 students at the California College of Arts and 
Ciafts at Oakland, California, were used as subjects in this 
study Of this group 299 weie men (median CA 22-8), and 128 
weie women (median CA 19-7). Only students having com¬ 
pleted nine or more semester units of art work at the school 
were studied. 

Guide averages in art courses were used as the critenon for 
art-college success. Reliability of art guides was computed by 
comparing first-semester gi ade avei ages with subsequent grades 
of 92 students having completed more than 45 semester units 
of art work. A correlation of ,84 was obtained, thus indicating 
that art-course grades in this college aie reasonably reliable. 

The Kudcv Preference Record is scored for nine areas of in¬ 
terest. (1) Mechanical, (2) Computational, (3) Scientific (4) 
Persuasive, (5) Artistic, (6) Literary, (7) Musical, (8) Social 
Service, and (9) Cleiicnl. As each response constitutes a choice 
of one atea over two others, a picture of relative interest is 
given and not absolute interest, as is the case with the Strong 
test. The Kuder test has several advantages for research. Piob- 
ably most important is the ease of scoring and the possibility 
of analyzing the scores. It is also comparatively easy to con¬ 
struct norms for selected groups when using the Kuder test and 
this was considered to be a useful undertaking as norms given 
by Kuder for art students and artists are not as complete as 
could be desired. 


Results 

Scores earned on area five of the Kuder Preference Record 
are intended to indicate interest in art. The mean scores of the 
427 students in the art group used in this study, are closely in 


3 Strong, loc. at. 
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.oH, which i.s not .statistically significant U ct|ttals 1//0, Com¬ 
parison of the upper ami lower 27 per cent of the subjects with 
respect to grade point average in art courses revealed a small 
and significant difference between means. The mean of the 
uppei group was 8H.26, the lower group S< yj while flu* critical 
ratio of the difference was 2.J2. The difference in scores between 
men and women was also slightly significant in favor of the 
men, the critical ratio also being 2.32, 

In comparing three groups of students specializing in dif¬ 
ferent areas of art at the California College of Arts and Crafts, 
no significant differences were found in their scores on the Kiuler 
art-interest score. The means for fhe commercial art students, 
fine arts students, and art-teaching students were Ky. 11. Sfi.Kj, 
and Hfi.47 respectively. This places the means of all titter groups 
between the 97th and y«rh percentiles in art interest. Scores on 
the Kuder Art Scale for the various groups rested may be found 
in Table 1. It may be concluded that the Kiuler Art Scale is 

* Kuder, G F Revised Manual for the Kuder Prefer fare Re cm A. Chicago: Science 
Research Associates, uj,| 8 , l’age 12. 
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valuable in differentiating the art group from the general popu¬ 
lation. The low correlation with art grades indicate that it is 
not useful as an indicator of the degree of talent within an art 
group. This correlation would probably be much higher in a 
more heterogeneous group. 

In addition to an analysis of the performance of the art group 
as a whole, it was decided to compare the performance of the 
three art-area groups of commercial art students, fine arts 
students, and art-teaching students in detail and construct 
profiles for them. It was considered most practical to first study 
these three area groups without regard for sex because of the 
small number of cases. Thus, the groups were first considered 
as a whole, and then the commercial art groups and the men’s 
teaching group which contain sufficient cases were studied with 
respect to sex. 

Table 1 gives scores of the three art-area groups on the nine 
Kuder Scales. It will be noted that there are no significant dif¬ 
ferences among the three groups in art interest, all scoring above 
the 95th percentile for both men’s and women’s norms. The 
commercial art group scored significantly higher than the other 
two groups in mechanical interest, it was superior to the teach¬ 
ing group in scientific and clerical interest, and was superior to 
the fine arts group in persuasive interest. 

Table 3 shows a comparison between men’s and women’s 
raw scores and percentile scores in the commercial art group. 6 
Although considerable sex difference exists, it will be seen from 
comparing raw scores and percentiles that these differences are 
markedly less than those given in the norms. For this reason 
it is probable that, until more complete norms are published, 
a comparison of raw scores would be simpler and more valid 
than conversion to norm-group percentiles when dealing with 
art students. In examining the performance of the art-teaching 
group it will be seen that this group scored significantly above 
the other groups in social service interest. In spite of this 
difference, the average score of the teaching group in this area 
is only 70,5a which is below the 50th percentile on the test 
norms, indicating that consideration of scores in all areas, re¬ 
gardless of percentile rank may be more useful in some cases 


•Percentile Scores taken from profile sheet for the Kuder Preference Record, 
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than restricting attention to extreme scores. The male art- 
teaching students were considered separately and their average 
scores did not differ markedly from the entire teaching group in 
any of the nine Kuder areas, thus giving some justification for 
using the same raw-score norms for both sexes until more com¬ 
plete norms are established by further research. 

The fine arts group scored significantly above the other two 
groups in literary interest and also scored highest in music in¬ 
terest, being significantly above the commercial art group. Be¬ 
cause of the small size of the group no sex differences were com¬ 
puted. Some data which may be of help in evaluating the 
performance of art with respect to raw scores on the Kuder 
Preference Record may be found in Tables 1 and 3. 


TABLE 3 

Comparison between Commercial Art Group Raw Scores and Percentile Scores on the 

Kuder Art Scale 


Group 

Mcc 

Com 

Sci 

Ter 

Art 

Lit 

Mua 

Soc 

Clc 

Men’s Raw 
Scores . . . 

73 -55 

14.99 

5'.73 

69-87 

87.15 

5^.45 

21. OS 

56.04 4 5.53 

Women's Haw 
Scores .. ,. 

S 9 . 9 S 

«3.59 

48.52 

62.83 

87.22 

47.68 

19.42 

68.73 48.97 

Men’s Percen¬ 
tiles . 

38 

l 6 

53 

46 

99 

63 

73 

H 

35 

Women’s Per¬ 
centiles,. 

71 

2 3 

37 

54 

98 

45 

41 

22 

22 


Summary and Conclusions 

With regard to the questions stated in the opening paragraph, 
the following conclusions may be drawn: 

1. The group of art students in this study scored very high on 
the Kuder Art Scale, the men averaging 99th percentile and the 
women 98th percentile, thus differentiating them adequately 
from the general population. 

1, The correlation between art-course success and art-interest 
scores is not significant for the group studied. The homogeneity 
of the art students with respect to level of art interest in part 
accounts for this low correlation. 

3. A comparison of interest profiles for commercial art, art 
teaching, and fine arts students reveal that significant differ¬ 
ences do exist. The commercial art group is significantly supe- 
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rior to both other groups in mechanical interest, exceeds the 
fine arts group in persuasive interest, and is significantly above 
the art-teaching group in scientific interest and clerical interest. 

The art-teaching group is significantly superior to the com¬ 
mercial art group in music interest and exceeds the fine arts 
group in persuasive interest. The chief characteristic of the art¬ 
teaching group, however, is its social service interest which is 
significantly above that of the other art groups. 

The fine arts group was high in literary interest and low in 
persuasive interest, being significantly different from the other 
groups in both. The fine arts group also exceeds the commercial 
art group in music interest. 

All three groups score highest in art interest, but are very 
similar, all averaging between 95th and 98th percentiles accord¬ 
ing to the Kuder norms. These findings agree quite closely with 
interest clusters suggested by Kuder in the test Manual. Fur¬ 
ther study is necessary before the norms found in this investiga¬ 
tion can be regarded with complete confidence. 



A FACTORIAL INVESTIGATION OF FLEXIBILITY 1 
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In a previous investigation, performance on certain tests 
which were designed to measure flexibility seemed to be in¬ 
fluenced by the ingestion of Benzedrine sulfate to a greater 
extent than was performance on "non-flexibility” tests (3). 
This evidence was not strong but was, none the less, provoca¬ 
tive. The present study was designed to investigate more 
thoroughly the nature of flexibility by subjecting modifications 
of these tests to a more rigorous analysis. For the purpose of 
this study flexibility is defined simply as the ability (a) to 
shift from one task to another, or (b) to break through an es¬ 
tablished set in order to perform a task. We have preferred to 
use the term "flexibility” rather than the word "persevera¬ 
tion,” which has frequently been used to describe the abilities 
measured by tests of the general kind used here, because the 
latter term so often has associated with it specific theoretical 
connotations, e.g., Spearman's mental inertia, Muller and 
Pilzecker's usage as a memory phenomenon, etc. 

In an attempt to make the results as unambiguous as possible 
it was decided to investigate only one type of performance, 
viz., performance in which S would be required to shift tasks. 
Only simple tasks were used in the hope that factors would be 
more easily identified. Tests were designed to measure numeri¬ 
cal, perceptual speed, and verbal factors. Within each area the 
attempt was to make some of the tests factorially pure. One 
test in each area, however, was designed to measure flexibility 
by requiring S to shift from one simple task to another. It 
was anticipated that factors associated with number, perceptual 

1 This study was aided by a grant from the Committee on Research of the Graduate 
School of Northwestern University. 
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speed, and verbal abilities could be isolated from this battery. 
The important consideration, however, was whether or not a 
factor which was common primarily to the tests requiring shifts 
of tasks would also emerge. If those tests which required shifts 
of tasks appeared on an independent axis, regardless of the 
type of ability represented, there would he evidence for a factor 
which might be called “flexibility” common to different types 
of tasks. 


Description of Tests 

Thirteen tests comprised the battery analyzed in this study. 
All tests were speed tests and, with the exception of the Same- 
Opposite ‘Test, were administered in two parts so that estimates 
of test-retest reliabilities could be made. All tests were an¬ 
swered on separate IBM answer sheets. The various tests were: 

Single Digit Numbers Tests (SDN).~~ Each of these tests con¬ 
sisted of 120 items administered in two parts of 60 items each. 
The time limit for each part was 90 seconds. S's task was to 
indicate whether answers of the problems as given were right 
or wrong. Sample items from each test are: 

1. Subtraction 

1. 8 — 3 « 6 

2. 7 — 4 3 

3. 3 — 2 m > 0 . 

2. Addition 

1.7 + 2 = 9 

2. 5 + 6 » 12 

3. 8 + 2 =» 11 


3, Mixed 


1-9-4 
2. 8 + 4 
3-3 + 6 


13 

12 

10 


4. 2 — 2 “ I 

5. 8 + $ ■* 12 

6. 7 — 1 «■ 6 


Tests were administered in the following order: Subtraction 
(Part I); Addition (Part I); Mixed (Part I); Mixed (Part II); 
Addition (Part II); Subtraction (Part II). 

Two Digit Numbers Tests (TDN) .---These three tests were 
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the same as their counterparts in SDN tests, except that each 
of the numbers to be added or subtracted consisted of two 
digits, e.g., 11 +36 = 47. In no case were the sums greater 
than two digits, although remainders were either one- or two- 
digit numbers, Two and one-half minutes were allowed for 
work on each part of the test. The tests were given in the fol¬ 
lowing order: Addition (I); Subtraction (I); Mixed (I); Mixed 
(II); Subtraction (II); Addition (II). 

Same-Opposite 'Test (SO ).—This test was comprised of 60 
of the more difficult pairs of words drawn from various forms 
of the Army Alpha Test 6 (7). S indicated whether the words 
had the same or opposite meanings. The time limit for the 
test was two and one-half minutes. Sample items are. 

I. acme-climax 
a. ligature-band 
3. abstruse-recondite 


Word Completion Tests (WC ).—Each of these tests consisted 
of 60 items. Each test was administered in two parts of 30 
items each with a time limit of 90 seconds for each part, d’s 
task was to select the one letter from among five alternatives 
which formed a word when used with a given stem of three 
letters. None of the stems were three-letter words in and of 
themselves. The tasks and sample items from each test are: 


1. 


2. 


3- 


Add final letter. (In this test S had to select the letter which 
made a common four-letter word when added to the end of 
the stem.) 

“345 

1. cen d c a t r 

2. lam r b m f w 

Add initial letter. (In this test S had to select the letter which 
made a common four-letter word when added in front of the 
stem.) 




1 2 

1. oun s f 

2. alf c u t a k 

Mixed. (In this test letters might go either in front or in back 
of the stem to form a word. No cues other than the stem 
itself were given as to whether the answer was an initial or 
final letter.) 

i*345 



IIO EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The tests were administered in the following order: Final letter 
(I); Initial letter (I); Mixed (I); Mixed (II); Initial letter (II); 
Final letter (II). 

Perceptual Speed Tests ( PS ).—Each of these tests consisted 
of 60 items administered in two parts of 30 items each. The 
time limit for each part was two minutes. Each item consisted 
of a line of 30 capital block letters. S's task was to count the 
number of times some particular letter appeared. This letter 
appeared from one to five times in each line. The number of 
times it occurred was also the answer for the item which was 
entered directly on the IBM answer sheet. Sample items from 
the various tests are: 

1. "IF‘ test. (In this test the number of "N’s” occurring in a 
line of “M's” was counted.) 
x. M M M M M M M M N M M M M M N M... 

2. M N M M M M M N M M M M M N M M... 

a. "W" test. (In this test the number of “W’s” occurring in a 
line of “M’s” was counted.) 

1. M M M M M M M M M W M M M M M M M M... 
a. MWMMMMM M W M M W M M M M W M.,. 

3. Mixed . (In this test each line consisted of “M's”, “NY', 
and “W’s”. At the beginning of each line the letter to be 
counted was indicated in parentheses.) 

1. (W) MMMMNM M M W M M N W M N N M... 
a. (N) N M M M M W N M N W M M M M M M M... 

3. (N) MMMMMNNW M M M N M M W M VV... 

The order in which the tests were administered was: “IV 1 s" 
(I);"#"s” (I); Mixed (I); Mixed (II);“/Ts” (II); “IW’ (II). 

Population 

The test battery was administered to 205 college students. 
These were tested in groups ranging in size from eight to 48 
S's. These tests were given to 104 S‘s in the order in which 
they are described above, while 101 were given the tests in the 
reverse order. Since mean scores of groups showed no signifi¬ 
cant differences attributable to order, all data were combined 
into one group, 

Results 

Table 1 shows the zero-order intercorrelations of the tests in 
the battery. It will be noted that all correlations are positive, 
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ranging in size from .908 (r a s ) to .072 (1*7-11). Raw-score means 
and S.D.'s are also presented in this table. Examination of 
these means shows that alternation or mixed tasks in the 
number tests appear to be of the same order of difficulty as 
the straight addition and subtraction tasks. This is true for 
both the Single Digit and the Two Digit tests. This result 
would not have been anticipated from results of the Benze¬ 
drine study. In chat study mixed addition and subtraction 
problems were markedly more difficult than either the addition 
or the subtraction tests alone. Tests used in the earlier study 
were just like the Single Digit Numbers tests except that S 

TABLE 1 

Factor Leadings a fid Communalitics aj Variable! in Flexibility Battery 


Centroid Loading *j Routed Loadings 


Tent 

I 

II 

III 

IV 

8; 

S\ 

I 

II 

m 

IV 

*’r 

I. 

717 

-306 

341 

— 446 

791 

0 ■ 

80' 

334 

149 

799 

156 

796 

a, 

777 

“•at)! 

350 

-259 

878 

«7,!l 

337 

196 

836 

i6o 


3. 

801 

—048 

197 

-413 

837 

»4 

J<>8 

436 

7B0 

194 

836 

4 

847 

221 

271 

436 

896 

90, 

487 

417 

404 

1691 

i»97 

$■ 

854 

40J 

191 

238 

91.1 

Qi; 

293 

403 

.,40 

697 

1916 

6. 

864 

198 

164 

449 

908 

9'] 

3*4 

417 

441 

O78 

1909 

1 

367 

309 

-°94 

-086 

446 

3°'! 

°57 

464 

0O4 

143 

243 

8. 

700 

388 

-477 

-443 

767 

751 

253 

«<5 

>52 

109 

7&3 

9 - 

694 

427 

40 J 

— 004 

747 

74 i 

198 


174 

169 

743 

10. 

OO 4 

354 

— a6a 

— 211 

<x>3 


2I4 

73' 

no 

077 

600 

II. 

ssi 

-429 

— 498 

448 

648 

64' 

776 

029 

129 

081 

646 

12 . 

636 

-4^3 

-376 

233 

779 

78,! 

8&3 

105 

116 

081 

77 6 

1 3- 

713 

—408 

“313 

247 

840 

83!!' 

876 

134 

177 

150 

839 


wrote his answer beside the problem instead of indicating on a 
separate sheet whether or not the given answer was correct. 
Apparently it is just this difference that accounts for the diver¬ 
gence in results obtained here. In both the Word Completion 
tests and the Perceptual Speed tests the mixed sections yield 
significantly lower scores than do the other sections. 

Table 2 contains the centroid and the rotated factor loadings. 
Four factors were extracted from the intercorrelation matrix. 
No significant residuals remained in the fourth-factor residuals. 
As a matter of fact, the fourth factor itself contributes little 
to any correlation. Rotations of these factors were made to 
satisfy, insofar as possible, criteria of simple structure and posi¬ 
tive manifold. It is apparent that those tests which had been 
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designed to measure flexibility (tests 3, 6, io, and 13) do not 
group themselves along an independent axis, but rather can 
be accounted for in terms of number, perceptual, and verbal 
factors depending upon the type of test. The factors are rela¬ 
tively easy to identify according to the nature of the task. 

Factor I (Perceptual Speed-P ).—The highest loadings in this 
factor are found in the PS tests (tests 11, 12, and 13). It is not 
surprising that SDN tests (tests 1, 2, and 3) show some satura¬ 
tion in this factor. The simplicity of the problems, for college 
groups at least, is such that perceptual speed might well in¬ 
fluence the speed with which correct and incorrect answers 
are recognized. 

Factor II ( Verbal-V ),—Tests 7, 8, 9, and 10 show the highest 
loadings in this factor. Test 7, Same-Opposite, shows less load¬ 
ing in this factor than might be expected, but, none the less, 
its relationship to the word completion tests is unmistakable. 
The TZW tests show minor loadings in this factor. Some cor¬ 
relation between verbal and numerical factors has often been 
observed. This relationship may depend on the complexity of 
the numeiical task, since the SDN tests show practically no 
loading on this factor. 

Factor III (<Single Digit Number [SDN]). —Clearly, tests 
1, 2, and 3 have the highest loading in this factor. Considering 
the apparent similarity between the SDN tests and the FDN 
tests one might have expected the latter to exhibit higher 
loadings in this factor. The FDN tests, however, came out on a 
factor of their own. 

Factor IV (Two Digit Number [TDN ]).—This factor shows 
the highest loadings in the TDN tests. Relatively little of the 
variance of other tests in the complete battery can be ac¬ 
counted for on the basis of this factor. 

Table 3 shows the per cent of the total variance of each test 
attributable to each factor. Thus, 64 per cent of the variance 
of test I (SDN, subtraction) can be accounted for by the SDN 
factor, n per cent by the factor P and only 4 or 5 per cent by 
both factors V and TDN combined. The sixth column (hf) 
shows the communalities computed from the rotated factor 
loadings for each test, or the per cent of variance in each test 
accounted for by the four common factors. The specificity 
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(Sp) of each test (column seven) is the difference between the 
reliability of the test (r n , column eight) and h\ (Since h\ can¬ 
not be greater than r n , when this occurs it is apparently due to 
slight errors in estimating one or the other value.) It will be 
noted that Sp is close to zero for ail tests with the exception of 
the Same-Opposite test. Since the Same-Opposite test was a 
speed test and since it was given in only one part, no reliability 
was computed. However, it was assumed that the reliability 
might be near ,80. If this estimate is not seriously in error, 


TABLE 3 

Factor Variance Accountedjor by Factors Isolated in Flexibility Baitery 



then the test shows a high degree of specificity. Apparently the 
verbal factor isolated here is not the only one necessary to 
explain scores on a rather difficult word meaning test. 

Error variance (£„) is shown in the last column of Table j. 
It is the difference between 1,00 (assumed total variance of 
each test) and the reliability coefficient (non-error variance). 

Discussion 

The analysis of the data presented above suggests strongly 
that no factor of flexibility need be postulated to account for 
differences in performance on these simple alternation tasks. 
Since Spearman spoke so enthusiastically about the factor of 
perseveration, relatively little evidence has been put forth to 
substantiate the hypothesis (5). Most previous investigators 
have attempted to investigate the problem by using batteries 
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of tests which had been designed to measure anything that 
might possibly be considered under the term perseveration. 
Notcutt, for example, used a battery of 15 tests designed to 
measure sensory perseveration, motor perseveration (both crea¬ 
tive effort and alternation type), and associative perseveration 
(4), Cattell likewise has used batteries of fairly complex tests; 
indeed, Notcutt borrowed extensively from Cattell’s tests (1). 
It would seem that the aim of these investigators differed 
from ours. They were attempting to identify a general per¬ 
severation factor which would influence the total behavior of 
the person. Their postulate, based on Spearman’s “Law of 
Inertia,” led them to hope that this factor would pervade all 
sensory and motor activities. It should be found in learning 
and would be an important factor of temperament and as such 
should influence feeling, attitude, apperception, and even the 
“natural rhythm” of the individual. Experimental results do 
not support such a general factor (a). 

In this study we have steered clear of perseveration conceived 
of as an all-pervasive factor. This concept has been avoided 
by the adoption of the term flexibility, and with the use of 
simple tests scored in a simple way. At this level of simplicity 
it is quite evident that a general factor of flexibility does not 
exist. Perhaps it may be demonstrated if a more complicated 
series of shifts between equally well-established habit patterns 
were to be required of S. Thus within a single factor area, say 
Number, S could be tested in addition, subtraction, multi¬ 
plication, division, and various combinations of these functions. 
To this could be added various ways in which the problems 
could be presented. Flexibility would be demonstrated if the 
mixed tests should group themselves along an independent 
axis. 

Notcutt presents results which appear at first glance to be 
at variance with our findings (2). He states that in his battery 
of tests the alternation tasks “reveal a genuine though small 
factor.” His method of analysis involved the averaging of the 
intercorrelations of five alternation tasks. The average correla¬ 
tion thus obtained was 0.181 =b 0.030. The tasks required in 
these tests were such things as writing Id’s then tti’s, and writ¬ 
ing ABCD , then abed. The method used in this analysis seems 
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somewhat tenuous for this result to he accepted with confi¬ 
dence. It would he interesting to determine what factors would 
emerge from an analysis of his intercorrelation matrix. 

Scoring .—Earlier writers have been greatly concerned about 
the problem of scoring alternation tests. Since they were in¬ 
terested in the hindering or facilitating effects of the alterna¬ 
tion task as compared to the homogeneous task, scores were 
sought to express this relationship. Cattell criticizes the use of 
the simple difference score (X — T) by pointing out that this 
score will be highly correlated with speed (l). Thus if X — 
the score on the critical task and T = the score on the homo¬ 
geneous task, a slow worker would get a smaller flexibility score 
than a fast worker even though both experienced an equal 
amount of interference. Therefore, he used the ratio X/Y in 
scoring his tests. This scoring is satisfactory as long as one is 
working with tests requiring S to overcome habitual sets such 
as Cattell’s triangle, reversed letter, and cancellation tests. 
Thus, on his reversed letter test Y would equal the score on 
writing the letters opqnt and X would equal the score on 
writing these letters in the reversed order tsrqpo. Walker, 
Staines, and Kenna, however, point out that this method can¬ 
not be used for alternation tests such as those in the present 
study (6). To do so one would let Y ~ the combined score on 
the homogeneous tasks, e.g., Addition plus Subtraction, and 
X — the score on the mixed task. The scoring formula X/Y 
under these conditions can give an accurate picture of an inter¬ 
ference effect only if the speed of work on the two homoge¬ 
neous tasks is equal. If one of the tasks is inherently more diffi¬ 
cult than the other, this method of scoring will show an 
artifactual interference effect even though none may exist. 5 
They suggest, therefore, that the 

Interference Score = E/A 

where £ = expected score on the alternation task, 
and A = actual score on the alternation task. 


5 As the authors point out, if 5 could do 60 addition problems in one minute and 
only 30 subtraction problems in one minute, his rate for addition would be one prob¬ 
lem n second, and for subtraction one problem every two seconds, If S were doinp; both 
addition and subtraction (mixed) in one minute he should do 40 problems, and in two 
minutes 80 problems. On the other hand, if he spends one minute doing addition 
problems and one minute doing subtraction problems, he will finish a total of 90 prob¬ 
lems. Here the scoring formula X/Y =* 80/90 *» ,89, The interference shown is an 
artifact. 
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Thus if 60 seconds is devoted to each task, 

F = 60 

Ti + T,. 

7i = the time to do each unit of the first task and is given by 
the formula 


n = 


and Ti = 


_ 60 

score on the first activity 

_60__ 

score on the second activity. 


While the logic of these scoring methods is sound, it does not 
seem necessary to resort to them in correlational studies. In 
fact their use makes it very difficult to determine the relation¬ 
ship between the flexibility and the nonflexibility tasks. Thus, 
while the use of such scores makes it possible to determine 
whether or not interference exists, it is impossible to determine 
from them whether or not the interference is uniformly ex¬ 
perienced by all (high positive correlation) or is a factor 
unrelated to performance on the nonflexibility tasks. On the 
other hand, should the E/A ratio fail to give indication of in¬ 
terference, it still does not seem admissible to conclude that 
interference was not a factor. Thus, using this ratio on the ob¬ 
tained mean scores on the SDN tests (Table 1), an interference 
score (E/A) of .53 is obtained and on the WC test we get an 
interference score of .92, In both tests, of course, if the mixed 
tasks and the single tasks were both equally difficult, E/A 
should equal .50. At first glance, therefore, it would seem that 
flexibility could be a factor on the WC tests but not on the SDN 
tests. This is an erroneous impression, for in spite of the high 
average score on the SDN mixed test, it is necessary to know 
how the mixed test correlates with both the Addition and 
the Subtraction tests. If these correlations were appreciably 
lower than the correlations between the Addition and the Sub¬ 
traction tests, it would indicate that some factor other than 
ability to add and subtract entered into the mixed task to 
lower the correlations. These correlations are, however, essen¬ 
tially equal. The same may be said for the WC tests. In this 
case since r 8 -», r 8 -i0, and ro-uo are all about equal in magnitude, 
indications are that a factor of flexibility need not be postulated 
to account for the relatively slow performance on the Mixed 
tests. Our factor analysis, of course, confirms this throughout 
the battery. Thus, it is felt that in factorial investigations 
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scoring formulae such as those described above are not only- 
unnecessary but that actually these procedures may obscure 
data which can give valuable information. 

Summary and Conclusion 

The purpose of this study was to investigate the nature of 
flexibility by factorial methods. A battery of 13 tests was con¬ 
structed, These tests were designed to measure numerical, per¬ 
ceptual speed, and verbal factors. Within each area the attempt 
was to make some of the tests univocal (factorially pure), 
One test of each type, however, was designed to measure flexi¬ 
bility by requiring S to shift from one simple task to another. 
The tests were designed for machine-scoring and were speed 
tests. *y’s were 205 college students. Test scores were inter- 
correlated and the matrix of intercorrelations was factorially 
analyzed. Four factors were extracted. These were identified 
as: (P) Perception, (V) Verbal, (SDN) Single Digit Number, 
and (TDN) Two Digit Number. Those tests which required 
a shifting of tasks could be accounted for on the basis of the 
above four factors; consequently the postulated factor of flexi¬ 
bility common to the different types of tasks was not necessary 
to account for the obtained results. 

In reference to scoring, the position is maintained that the 
various difference scores and ratio-scoring techniques used by 
other investigators are not necessary in factorial investigations 
of flexibility and indeed may obscure the essential relationship 
between the flexibility and nonflexibility performance. 
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THE STANDARDIZATION OF THE MOORE EYE-HAND 
COORDINATION AND COLOR MATCHING TEST 1 


JOSEPH E MOORE 
Georgia Institute of Technology 

The Moore Eye-Hand Coordination and Color-Matching Eest 
was originally developed to measure the speed of eye-hand 
coordination of small children, and it was found in subsequent 
studies to differentiate cleaily the same factor in adults. It 
was thought that if a test of eye-hand coordination could be 
devised which would stimulate immediate interest, it would 
prove valuable in measuring certain differences in young chil¬ 
dren in whom this type of learning has not occurred to any 
great extent, or has not become highly specialized. 

In order to devise a test which would appeal strongly to 
young children it was decided to utilize their inteiest in mar¬ 
bles. The first test that was constructed was a bulky affair 
and difficult to manipulate. By a process of trial and revision 
the instrument has been markedly changed and, it is to be 
hoped, impioved. The pre-school and the adult tests are identical 
except in length. The pre-school, or short form, has been used 
to test both white and Negro children as young as two years of 
age. Motivation is rather easy, since children generally take a 
keen delight in picking up the marbles and putting them in the 
holes. 


1 This project was nude possible in part through a grant-in-aid allocated by a Re¬ 
search Committee at the Georgia Institute of Technology from funds made available 
Jointly by the Carnegie Foundation and Gem gin Institute of Technology. The author, 
howcvci, and not Georgia Institute of Technology, is solely responsible for statements 
made in this report. 

The writer wishes to acknowledge the assistance and cooperation of tile following 
individuals Riesident Robert P. Daniel and Mr, William N, Smith, Personnel Coun¬ 
selor, Shaw University; Prof, lloriuda Duncan, Tuskegee Institute, Dr. .Susan Chap 
George Peabody College; Dr Sidney Q, Janus and Pro! Albert S (Hickman, Georgia 
Institute o! Technology; Prof Ikrmnn Long, Fisk University; Dr, C, W. 'Jhnmnsson, 
Drexel Institute; Di. R. R, Ullman, Wittenberg College, and Prof. Joseph L. Whiting, 
Atlanta Univeisity. 
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The following pictuie shows the ailult or long form of the test. 



Fin. I. Kyc-lliind Conrdinntinn and Color Maicliing Test 

It will he seen that the test is a rectangular hoard 16J x 
U)\ inches. The thickness of the test is slightly over } inch. 
There are four lows of one-half inch holes. Kaeli row contains 
eight holes spaced ii inches apart. There aie four starting 
boxes or slots holding the marbles, one at the end of each row 
of holes. Each box holds eight marbles for the Speed Test and 
twelve mill hies for the color-matching part of the test. 

The Color-Matching Test operates in the following way: 
Under each hole there is a colored piece of paper covered by 
transparent tape. The colors, in order, are red, green, blue, and 
yellow for the first row of eight holes. The second row is green, 
blue, yellow, red, the color sequence being different for each 
row. 

The Pre-School 'Pest is actually half as long (id marbles in 
each trial arc used instead of 32) as the adult form. The child 
is seated comfortably and is told to watch as the cxaminci shows 
him how to play the game. The examiner then takes one mar¬ 
ble at a time and puts it in the hole so as to give the impression 
that it is fun to play the “game” fast. The child is then per¬ 
mitted to take a practice trial on the first eight marbles. The 
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test score is the total number of seconds it takes a child to do 
the 16-hole test three times, placing the marbles in consecutive 
order. A Pre-School Test can be made by covering one-half 
of the Adult Test with a piece of cardboard. 

The norms for the Pre-School Test were based on the scores 
of children from nursery schools, kindergartens, and lower¬ 
elementary schools from the states of Tennessee and Kentucky. 

From Table i it is seen that the average time for each age 
group becomes progressively faster. Comparison of the means 
with medians shows that every group except one is negatively 
skewed. 2 The range in scores indicates the extremes that are 
found in the reaction time of children within the age range 
studied. The standard deviation tends to become progressively 

TABLE i 

Speed Measured in Seconds, of Eye-Hand Coordination of 431 Children on the 

Pre-School Form _ _ 

Age In Months 



24-29 

30-35 

35-41 42-47 48-53 

Number of Children 

54-59 

60-65 

66-71 

72-77 

78-83 


10 

25 

45 

56 

47 

54 

49 

38 

78 

29 

Range Of Speed 

141—448 

122-372 

88-275 

90-210 

81-205 

76-222 

75-191 

66-146 

60-110 65-95 

Median 

215 

177.5 

142 

135 

120.9 

106.3 

100 

88 

83.4 

79.1 

Mean 

225 

177.5 

156.7 

137.6 

123.3 

112 

108 

93 5 

84 6 

80.9 

S tandar d D ev ] atlon 

94.2 

58.2 

48.7 

26.7 

29 4 

27.0 

26.2 

22.5 

10.7 

7.6 


smaller for each succeeding higher age group represented in 
the sample. 

The Long Form or Test for Adults 

The long form of the test requires placing yi marbles, one at 
a time, in consecutive order in the holes. The test is taken 
in a seated position and has been standardized at typing-table 
height, or approximately 26 inches. The subject first has a 
practice trial of a row of eight marbles. The individual's score 
is the total number of seconds it takes him to complete three 
runs of 32 marbles each. 

The long form of the test has been employed to measure the 
speed of eye-hand coordination of children in both elementary 
and high schools. The data on the performance of individuals 

'Scores represent the number of seconds necessary to do the test. The fewer the 
seconds the faster the performance. The distribution therefore represents slow scores 
at the left and fast scores at the right. 
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between the ages of six years and sixteen years are presented 
in Table 2, 

Each age group represented in die above sample (Table a) 
completes the test progressively faster than the next younger 
age group. If the small sub-group samples are representative 
of the corresponding larger populations it would seem that the 
smallest changes in speed and precision occur in the younger 
age groups, ages six through ten, and the greatest between the 
ages of eleven to sixteen. It will be noted that four of the age 
groups are positively skewed, the median being larger than tiie 
mean. The age groups which are positively skewed are six, seven, 
thirteen, and sixteen. The small sampling could account for a 
part or all of the skewness. 

TABLE 

Speed , in Seconds, qf Eye-Hand Coordination for 602 Subjects A ted Six 
Through Sixteen Years 

Ag» 

4 7 8 9 19 II II 13 M IS IS 

Number of SubfecU 

« 88 M « I* 38 43 43 28 JOT 

Rwige 121-209 119-195 105—175 9J-IM Kll-Ul IWMJ5 (11-150 B4-I30 81-141 l 133 75-125 

Medina 166.3 145.0 153.0 130.0 110.0 110.5 115.4 H0.8 107.3 103.8 95.4 

Mean 181.6 144.5 135.0 132.4 123.3 118.1 116.1 10*3.7 108.9 101.5 93,3 

313 __ 19,7 T 5 ' 5 15,7 14-9 17,1 7,7 l3 ‘ l 9-7 17,9 11-7 u -° 

• Detailed norm* era given In llic Manuals. 

Data available on the long form of the test indicate that it 
can be considered reasonably well standardized, at least on 
Southern men and women. The data from two Northern schools, 
Drexel Institute and Wittenberg College, are so similar that it 
does not appear that any great divergence of central tendencies 
and variability are to be expected in other areas. Further stud¬ 
ies are encouraged, however, to prove the accuracy of this 
assumption. 

The data that have been accumulated on the long form of the 
Moore Eye-Hand Coordination and Color-Matching Eest are 
presented in detail for adult subjects in Table 3 . Separate 
norms have been presented for white and Negro subjects. The 
justification for the separate norms for whites and Negroes was 
the fact that the difference between the average time of the 
two groups favored the whites on both the speed and the color- 
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matching tests. The difference in performance of the whites 
was statistically significant at the I per cent level for all com¬ 
parison except that between white women and Negro women 
on speed, in which instance the difference was not statistically 
significant. 

Table 3 reveals that on the speed of eye-hand reaction, 
women are faster than men. White men did the test more 
rapidly than Negro men and white women did the test more 
rapidly than Negro women. These differences favoring the 


TABLE 3 

Norms for Adults on the Long Form of the Moore Speed of Eye-Hand Coordination Test 



College 

White Men 
Non-College 

Bus!, te Ind 

Negro Men 

College Non-College 

Number of Subjects 

776 

2,707 

1,222 

451 

108 

Range 

73-123 

70-180 

75-175 

80-130 

83-180 

Median 

96.18 

104 30 

IO 4 .O 4 

100.92 

109 0 

Mean 


106,00 

IO3.20 

99.20 

III.O 

S.D. 

8.85 

IQ..OO 

10 48 

9.17 

14.5 



White 

; Women 

Negro Women 



College 

Bus! & Ind. 


College 

Number of Subjects 


324 

348 


280 

Range 


74-144 

74-131 


64-136 

Median 


94.27 

98-5 


93.61 

Mean 


9 S -59 

99 0 


96.23 

SD 


9 55 

9.25 


10,13 


white men are statistically significant. The difference between 
the mean of college women favored the faster performance of 
the white women but the difference is not statistically signifi¬ 
cant. Negro women performed the test somewhat more rapidly 
than did men in the college groups. The greatest differences in 
speed of eye-hand coordination are between college and non¬ 
college groups rather than between racial groups. 

The non-college white males were men who came through 
the Georgia Tech Guidance Center and were being considered 
for work calling for some type of manipulative skill. The scores 
of these men were negatively skewed. In short, these men did 
fairly well in performance calling for quick and accurate manip¬ 
ulation insofar as such factors were measured by the Moore 
test. It will be seen that Negro college men also worked much 
faster than the non-college Negro group. 
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The Color-Matching ‘Test 

The Color-Matching Test has been developed during the 
last eight years at the request of certain industrial firms. The 
test requires that the individual match a marble of a specific 
color with a hole of the same color. The colors, as were men¬ 
tioned previously, are arranged in an irregular order, The four 
colors used are red, green, blue, and yellow. The score on the 
color-matching part of the test is the number of seconds re¬ 
quired to complete the test; that is, to match the 32 marbles 
with the 32 colored holes three times. If a mistake is made, 
such as placing a red marble in a yellow hole, one second is 


table + 

Norms /or Adults on Spied 0] Cohr-Statehinp 


Number of Subject# 

Range 

Median 

Mean. 

S. D. 


Number of Subject# 

Rnnge 

Median 

Mean 

S. D. 



While Men 


N'tftro Men 


Si ll f. 'If . r ) 

-»i.i 1. Ir.'l 

<’o!1r*;r Non-College 


l,l8l 

701 

451 81 

87-170 

«a o-w 

90 -.P+ 

91-138 100-180 

114.50 

i.U -9 

1.11.01 

113.7a 1J1.0 

ns-54 

t. 17-1 

1.13.50 

115.01 138,0 

11.30 

14.8 

18.41 

15-35 36.0 


Whit* Women 


Negro Women 


f'Bilejj* 


College 


Ul 


180 


80H4 


84-190 


no.oa 


118.10 


110.01 


HU .33 


ir.70 


10.10 


added to his score for each such error. The subject is not per¬ 
mitted to arrange the marbles in a definite order previous to 
the starting signal. 

Table 4 presents the data on the color-matching test with 
separate norms for white and Negro groups. The white college 
groups did the test more rapidly than the Negro college groups. 
The differences between the respective means were statistically 
significant at the 1 per cent level. 

As would be expected, color-matching takes longer than the 
simple Speed Test. It takes an individual between five and ten 
seconds longer per trial to match the colors, or from fifteen to 
thirty seconds longer for the three trials. A comparison of 
Tables 3 and 4 reveals that the differences among the various 
groups are more pronounced on the Color-Matching Test than 
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on the simple Speed Test; especially is this the case in the com¬ 
parison of college and non-college white men. 

Validity 

The validity of the Moore Eye-Hand Coordination Test was 
investigated in a number of ways. Age differentiation was one 
criterion. The test differentiated between the various age groups 
from 24 months to sixteen years and older. As each group of 
subjects took the test those who were older tended to make 
faster scores. After the sixteenth year age did not seem to have 
any appreciable relation to speed on the groups included in 
this study. 

In the business and industrial field two studies are available 
on validity. In one study ten ice cream sandwich makers took 
the Speed Test and the scores were correlated with the number 
of dozens of sandwiches each turned out in a specified time. The 
coefficient of correlation was .52, The second study dealt with 
23 loom operators or weavers. The group was divided into 
those above and below average on the speed of color-matching 
and above and below $1.18 in hourly earning rate. A tetra- 
choric correlation of .86 was obtained. 

The correlations of the Moore tests with other dexterity 
tests were also used as indirect evidences of validity. The speed 
of eye-hand coordination correlated ,51 with the Pennsylvania 
Bi-Manual Work Sample (assembly) on 317 adult male sub¬ 
jects. On the Minnesota Rate of Manipulation Test the coeffi¬ 
cient for placing was .67 for 157 subjects, and for turning it 
was .45 for 191 cases. The color-matching part of the Moore 
Test gave the following correlation coefficients with other tests: 
O'Connor Tweezer, .54 for 133 subjects; Minnesota Rate of 
Manipulation (placing), .53 for 103 men. The Pennsylvania 
Bi-Manual Work Sample (assembly) correlated .51 for 237 in¬ 
dividuals and for disassembly, .50. 

Reliability 

The reliability of each of the three trials of the test was also 
studied on 44I men. The scores for trial one (32 marbles) were 
correlated with the scores for the second trial (32 marbles), and 
a coefficient of .83 was found. Scores for trial two were correla- 
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ted against scores for trial three and a coefficient of .77 was 
obtained. Scores for trial one were then correlated with scores 
for trial three and a coefficient of ,8a was revealed. From these 
data it would appear that the instrument is doing a fairly con¬ 
sistent job of testing, even if the obtained speed on the first 31 
marbles is taken as a criterion of actual speed. 

The reliability of the color-matching part of the test was com¬ 
puted on the scores of 83 college juniors and seniors by the 
test-retest method. Total scores (96 marbles) obtained one 
week apart were found to give a coefficient of correlation of 
.82. When this is corrected for the restricted range, the coeffi¬ 
cient becomes .955. 

The Moore Speed of Eye-Hand Coordination and Color-Match¬ 
ing Test yielded a correlation coefficient of .67 on a group of 
364 adults. It would appear that the speed element is playing 
a major part in both tests. 

The reliability of the Pre-School Form was checked on 81 
children drawn from the pre-school gioup and the first two 
grades of elementary school. The test-retest method after a 
period of one week gave a coefficient of reliability of .95. 

Summary 

1. The pre-school Form of the Moore Speed of Eye-Hand 
Coordination Test differentiates the performance of children 
between ages of 24 and 72 months. The reliability of the Pre- 
School Test for 81 children by the test-retest method after one 
week was .95. 

2. The long form of the test is able to differentiate between 
each age group for ages six through fifteen years. After the six¬ 
teenth year speed does not appear to be very closely related to 
age for the groups included in this study. 

3. The validity of the Speed Test has been investigated by 
using such criteria as age differentiation, correlation of the 
speed and production of a group of ice cream sandwich makers 
(.52), and correlation with the Minnesota Rate of Manipulation 
(placing) for 157 subjects (.67). On the Color-Matching Test 
a group of 23 weavers was divided into above- and below-aver- 
age groups on the color-matching test scores and above- and 
below-average groups on hourly earning rate, and yielded a 
tetrachoric correlation coefficient of .86. 
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4. The reliability of the Moore Test was determined by the 
test-retest method after a lapse of one week. The correlation 
coefficients ranged from .95 to .72. 

5. Coefficients of correlation between each of the three sep¬ 
arate trials were obtained on a group of 44I men and used as 
partial measures of reliability. Trial one correlated with trial 
two showed a coefficient of .83; trial two with trial three, .77; 
and trial one with trial three, .82, 

The Moore Eye-Hand Coordination and Color-Matching Test is pro¬ 
duced and distributed by The California Test Bureau. 



AN INVESTIGATION OF A COUNSELOR ATTITUDE 
QUESTIONNAIRE 1 

william a. McClelland 

Brown University 
and 

H. WALLACE S1NA1KO 
New York University 

Introduction 

Attitudes held by a counselor toward his own behavior in 
counseling situations, and toward various couseling techniques, 
can have marked implications for effective counseling, How¬ 
ever, definitive studies of counselor behavior are practically 
non-existent. It is the purpose of this study to determine the 
effectiveness of one technique in the quantitative measurement 
of these attitudes. Several applications of this technique, the 
questionnaire method, have been attempted and will be dis¬ 
cussed. 

Implicit in the questionnaire approach to the investigation 
of counselor attitudes is the assumption that “correct’’ re¬ 
sponses can be determined. In an investigation of this problem 
Chase 1 had 34 judges, "selected because of their known under¬ 
standing of and ability in counseling,” respond to a 101-item 
Questionnaire he had constructed. Typical items from the 
Chase Questionnaire are as follows: 

12345 Permitting the counselee to express himself freely. 

12345 Reprimanding the counselee for displaying aggres¬ 
sion. 

12345 Advising the counselee to stay on the safe side and 
not take chances. 

The five numbers before each item represent the counselors’ 
attitude toward the practice as follows: 1, Decidedly harmful; 

‘This paper was presented at the Midwestern Psychological Association meeting 
May 8,1948, 

* Chase, Wilton P. "Measurement of Attitudes Toward Counseling,” Educational 
and Psychological Measurement, VI (1946), 467-473. 
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a, Probably harmful; 3, Doubtful; 4, Probably good; 5, De¬ 
cidedly good. A scoring key was developed by counting as 
“correct” all single ratings of items that leceived a clear major¬ 
ity of the judges’ responses If two adjacent ratings of an item 
received a clear majority, both were scored as “correct” re¬ 
sponses. Chase was then able to give his Questionnaire to 
counselor trainees, to compare their responses with those of 
the 34 judges, and to derive quantitative “scores” (or indices 
of agreement) with the judges 

The above method of determining “correct” responses in an 
attitude questionnaire is subject to qualification. First, keying 
items is undoubtedly some function of the judges’ training and 
experience, their temperaments, and their philosophies of coun¬ 
seling. Thus, it is reasonable to expect considerable variation 
in keys derived from different groups of judges. Second, the 
original “set” given the judges might well influence their re¬ 
sponses. Chase instructed the judges to “Keep in mind that in 
every one of the items a general situation is described, and one 
is therefore not to think in terms of individual cases.” But 
counseling is not in terms of a general situation. It is con¬ 
ceivable that the same counselor dealing with a very dependent 
student would behave in one way, yet in counseling a how-to- 
study case might perform quite differently. In short, does the 
“general counseling situation” exist at all? If the same coun¬ 
selor can have different sets of attitudes for different counseling 
contacts, then there might be need for as many “correct” 
questionnaire responses as there are diagnostic categories and 
counseling philosophies. 

For purposes of the present study it was assumed that judges 
are capable of making responses to the Chase Questionnaire 
items in terms of a general counseling situation. If a meaning¬ 
ful key could be derived, the use of a questionnaire of this 
type might be of considerable value in the selection and train¬ 
ing of counselors. In an agency where counselors specialize in 
one or more problem areas, inspection of the individual re¬ 
sponses and total score would be of value in the placement of 
counselors. In a counselor-training situation early identification 
of attitudes at variance with local, empirically defined, “cor¬ 
rect” attitudes might facilitate the orientation of the training. 
These are two possible uses of such an instrument. 
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Method and Subject 

A list of expert counselors, each of whom was to re-evaluate 
the Chase Questionnaire items, was compiled. The group in¬ 
cluded only persons who had had at least ten years’ counseling 
experience in and around the University of Minnesota, or who 
had obtained the Ph.D. degree in Personnel Psychology at that 
institution. The thirteen expert counselors selected may be 
characterized as homogeneous by training and counseling ex¬ 
perience. Four of the group were academic instructors in coun¬ 
seling courses, four were full-time counselors, and five divided 
their time between personnel administration and counseling. 
The judges were given a shortened form of the Chase Question¬ 
naire: ten of the original 74 scorable items were eliminated be¬ 
cause they were specific to military separation counseling, 

A “Minnesota key” for scoring the questionnaires was ob¬ 
tained as follows: The mean and standard deviation was com¬ 
puted for each of the 64 items. Responses were weighted on 
the five-point scale described above. Those items were elim¬ 
inated which had a standard deviation of .8 or larger (arbi¬ 
trarily selected since these items could have more than two 
"correct” responses). Inspection of the distributions of judges’ 
ratings supplemented this application of summary statistics, 
Forty items remained on which there seemed sufficient agree¬ 
ment between the judges so that either one single or two ad¬ 
jacent ratings could be scored as “correct.” 

The subjects of the investigation were students in counseling 
courses and counselors at the University of Minnesota. They 
were administered the 64-item Questionnaire in the spring of 
1947 during the first week of the new term. Subjects came from 
two sources: 106 were students in either of two courses dealing 
with guidance techniques and counseling practices, and 53 were 
graduate students in Psychology or Educational Psychology 
who were either taking graduate courses in Counseling or were 
engaged in half-time college counseling. The former group of 
students answered the Questionnaire a second time, at the final 
session of the course in which they were enrolled. The students’ 
questionnaires were scored with the Minnesota key and with 
the Chase key for the forty scorable Items, 
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Results 

In the first part of the study, summary statistics for the 40 
items keyed by the Minnesota judges, and these same items 
keyed by Chase’s judges, showed limited spread. The combined 
group of students (graduate and undergraduate) had an aver¬ 
age score of 30 and a standard deviation of 3 items on the 
Minnesota key, and an average score of 17 with a standard 
deviation of 4 on the Chase key. As scores obtained from the 
two keys correlated .2.0 db .09, it would appear that the keys 
are quite dissimilar. However, inspection reveals two facts: 
Twenty-four of the 40 items had two adjacent ratings keyed 
“correct” by Minnesota judges, while only seven of the same 
40 items had double ratings on the Chase key. On the Chase 
key, 32 of the 40 items have one extreme or the other keyed 
as “correct,” while Minnesota judges used the extreme re¬ 
sponses for only 24 of the 40 items. These facts, the greater 
tendency to key adjacent responses as “correct” by Minnesota 
judges and a reluctance to key extreme values on the part of 
these judges, could account for the higher mean and smaller 
variability of the Minnesota scale and possibly for the low 
correlation between the latter scale and that of Chase. 

To test the hypothesis that both keys were really different 
they were compared in terms of a trichotomy of “good,” “harm¬ 
ful,” and “doubtful.” Under such a comparison only three of 
the 40 items were classified differently. Three counseling prac¬ 
tices rated “good” by Chase’s judges were considered of “doubt¬ 
ful” value by the Minnesota raters. 

One other possibility was suggested as an explanation for the 
low inter-scale correlation, namely, the reliability of the 40- 
item Questionnaire. Reliability estimated in several ways 
(Kuder-Richardson and split-half uncorrected) turned out to 
be about .20. To achieve minimum satisfactory reliability the 
number of items would have to be increased fivefold. Assuming 
the appropriateness of these tests of reliability, it appears that 
the two scales correlate about as highly with each other as 
single administration reliability allows. 

The writers feel that further analysis of such low reliability 
is not called for. Therefore, in spite of the unreliability of its 
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principal instrument, this study is presented for whatever inter¬ 
est it may be to the reader. 

The second part of the study involved an assessment of the 
effects of instruction upon counseling attitudes. The Question¬ 
naire was administered both before and after the undergraduate 
courses were given. In these two classes the mean scores ob¬ 
tained with the Minnesota key were about 30, with a standard 
deviation from 3 to 4 items on the pre-course administrations. 

In one class there was a slight, but not statistically significant, 
movement of the post-course mean score upwards toward the 
score of the instructor. In this class the post-course score corre¬ 
lated .11 ± .25 with final course grades. In the second class 
the post-course mean score was significantly lower (C.R. = 
2.8) than that group's pre-course mean score. Just why there 
should have been movement away from the instructor’s score 
in this second class is not readily apparent. It may be that this 
instructor was somewhat inconsistent in answering the Ques¬ 
tionnaire and in his actual teaching practices; or, simply, that 
the scale itself is too unreliable. In this second group there ap¬ 
peared to be a moderate degree of relationship between post¬ 
course score and course grades (r *•= .42 =fc .10). In any event 
these data offer equivocal evidence in support of the hypothe¬ 
sis that counseling attitudes can be modified by training, al¬ 
though it is clearer that subject-matter examinations in these 
two courses are not satisfactory measures of those attitudes. 
Further use of the Questionnaire with control groups would be 
helpful. 

A third estimate of reliability (which suggests the earlier two 
are underestimates) is offered by the correlation between pre¬ 
course and post-course scores for the 106 undergraduate stu¬ 
dents, r — ,52. The amount of time elapsed during the courses 
was nine weeks. 

The final problem investigated was the relationship of the 
amount of training and experience in counseling to scores on 
the Questionnaire. The two groups which were compared were 
the 106 undergraduates and 53 graduate students. The Minne¬ 
sota key mean-raw-score difference between the two groups is 
statistically significant (C.R, ** 2.8), with the graduates getting 
the higher scores. This is evidence for the common-sense hy- 
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pothesis that the longer a student studies in a particular school 
and/or discipline, the more likely he is to acquire the attitudes 
of his instructors Whether or not this greater agreement with 
the judges is a result of instruction, extra-curricular reading, or 
personal counseling experience, cannot be answered from these 
data. 


Interpretation and Conclusions 

1. Although it was possible to obtain considerable agree¬ 
ment among a carefully selected group of judges on the de¬ 
sirability of certain counseling practices, two obvious limita¬ 
tions of the questionnaire approach to the measurement of 
counselor attitudes must be mentioned. First, most of the 
judges spoke about the artificiality of rating practices in a 
“general counseling situation.” They reported that specifica¬ 
tion of the type of client problem, as well as the nature of the 
agency function, seemed important in keying “correct” re¬ 
sponses. Second, the low reliability of the scale makes the cur¬ 
rent approach suspect. Perhaps more rigorous item construction 
and analysis might yield more consistent results. 

a. There is equivocal evidence that students 1 attitudes 
toward counseling practices are susceptible to change with 
formal course training, and they are not markedly related to 
grades in counseling courses, 

3. Scores on a scale of counselor attitudes may have some 
value in differentiating the more-experienced from the less-ex¬ 
perienced counselors or trainees in terms of a given set of 
“correct” responses that have been empirically derived for a 
local situation, 

4. The reservations attendant to the use of the Chase items 
about counselor attitudes are such as to indicate that they 
should not be used in their present form. While the approach 
has possibilities, this study suggests the questionnaire analysis 
of counselor attitudes requires considerably further investiga¬ 
tion before it can be accepted as a useful, reliable tool. 



A NOTH ON TH UK STONE’S METHOD OF COMPUTING 
THE INVERSE OF A MATRIX 


W11.I.IAM C. COTTLE 
University of Kansas 

A Research worker seeking a concise method of computing 
the inverse of a matrix will find this in Thurstone’s method. 

TABLE r 


Computation! for Column 11 , Section li, oj Thurstone's Example Jor Computing the 

Intserse of o Matrix* 


4/1 

-ea 4 /t 


i 

.36 

-(.6o| (,80) » “.48 

— (.60) (.48) - -.288 

— {.6oj (.36) M -".216 

.000 
• 512 
.144 

Ck *.64 

— (.60) (1.64) *• —.984 

.656 

Si 1.64 

-.984 

.656 

.00 

“ (.fo) (l .CXt) 10 — .60 

-.600 

1.00 

-(.60) (0) ■» .CO 

1 .QQQ 

.00 

-(.6a) (0) *» .00 

.000 

Ck 1.00 

— (.60) (i.eo) “ —.60 

.400 

2j 1.00 

— .(>0 

.400 

Ck a.64 

-(.60) (2.64) «* -1.584 

*■05 6 

Xi a.64 

-1.584 

1.056 


* Not rounded to significant figures ns in Thurstone’s example (i, p. 47). 


The method is applicable to a matrix of any size. Such a per¬ 
son may find also, upon examining this method as outlined by 
Thurstone, that it would appear to be a rather esoteric solu¬ 
tion. 1 Possibly, because of Thurstone’s familiarity with the 
method, he overestimates the cognitive powers of his readers. 
Clarification of one step in the process of computing the in¬ 
verse would simplify the method, It is for this purpose that 
this paper has been written. 

1 Thurstone, L. L Multiple Factor Analytic. Chicago: University of Chicago Press, 
1947, PP. 46-48. 
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TABLE a 

'amputations for Column III, Section B, of Thurstone’s Example for Computing the Inverse of a Matrix* 





























I36 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Thurstone’s directions are simple to follow in setting up 
Section A of the example he gives.* To compute Section B and 
Section C, the steps are as follows: 

1. Column I of Section B is copied exactly from Column I of 
Section A for both matrices. 

l. The reciprocal, i/iiu is the reciprocal of the first entry in 
Column 1 of Section B, or 1/.80 «* 1 ,ac. This is recorded at 
the bottom of this column as shown in Thurstone’s example. 

3. Values for the first column of Section C are computed by 
multiplying Column I of Section B by this reciprocal, 

Ci 1 «= b,i(i/bu). 

(The reciprocal is constant for both the values of the orig¬ 
inal matrix and those of the identity matrix.) 

4. Column II ofSection B is computed by the formula: 

b,i “ a i2 — (bnCu) 

where=• .60 is constant for the entire column, both the 
original matrix and the identity matrix. Computations 
for Column II ofSection Bare shown in detail in Table 1. 

5. The second column ofSection Cis computed by the formula: 

c/i ** b, 2 (i/bn). 

Where b» is the second entry in Column II ofSection B. 

6. Column III ofSection 13 is computed by the formula: 

bji ™ «/» — bjiCn — bjtfti 

where c« *» ,45 and cm *= .27 are constants for each appro¬ 
priate column of Section B, Computations for Column III 
ofSection B are shown in detail in fable a. 

7. The rest of the computations can be followed from Thur¬ 
stone’s explanation. 

It is hoped that this explanation will enable anyone to follow 
this method of computing an inverse. The writer has used 
Tucker’s method, 5 and this method in computing an inverse 
of a matrix of the order of 8 X 8, and prefers the latter method. 
He would suggest also that no rounding of figures be done 
until the computation of the inverse is reached. 

’The writer is indebted to Dr. Clyde Coombs of the University of Michigan for 
the information necessary to follow Thurstone's explanation. The writer spent three 
days in company of two competent mathematicians in an unsuccessful attempt to 
follow the method before resorting to a letter to Dr. Coombs, 

•Tucker, L. “A Method for Finding the Inverse of a Matrix.” Psychomttrika, III 
(1938), 189-197. 



NOMOGRAPH OF PETERS AND VAN VOORHIS’ 
APPROXIMATION FORMULA FOR CORRECTING 
INTERFUNCTION CORRELATION COEFFICIENTS 
FOR HETEROGENEITY 

WILLIAM A. REYNOLDS 
National Broadcasting Company 

In setting up a testing procedure for selection and placement 
of employees in a large organization, it is often the practice to 
administer one or two tests to certain groups of applicants, and 
to add new tests to the schedule from time to time. Thus, when 
it is desired to construct a test battery from the results of a 
multiple-regression study, it is found that the populations to 
which the individual tests have been administered are larger 
than the population to which the two or more tests in combina¬ 
tion have been administered. The larger populations usually 
have larger standard deviations; statistically they are more 
heterogeneous. Since they are the populations on which later 
test batteries will be validated, the information on hetero¬ 
geneity may be used to predict more accurately the true rela¬ 
tionships between two tests which have been administered to 
but a fraction of the number to which each separately has been 
administered. 

It is well known that the size of a coefficient of correlation is 
affected by the heterogeneity (range of talent) of the population 
on which it is computed. If a formula were available for correct¬ 
ing the correlation between two tests by taking into considera¬ 
tion the ranges of talent on both tests, a better estimate of the 
true correlation between them could be obtained. Such a for¬ 
mula is available in Peters and Van Voorhis’ Statistical Pro¬ 
cedures and Pheir Mathematical Bases d 
These authors develop their formula by considering first the 
problem of estimating a corrected reliability coefficient. The 

1 Peters, C, C. and Van Voorhis, W R, Statistical Procedures and Their Mathe¬ 
matical Bases. New York: McGraw-Hill Book Co,, Inc., 1940. Pp, 208-210. 
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formula for the correction of a reliability coefficient for hetero¬ 
geneity is given by Peters and Van Voorhis in formula 129, as 
follows: 


a x \/l — Ru (Formula for correcting a reliability , 

=: Vf—~ ru coefficient for heterogeneity) ' I2 9 ^ 


where, 

cr % = standard deviation of the shorter range of talent 
2* = standard deviation of the longer range of talent 
ru « reliability coefficient of the shorter range of talent * 
Ru = reliability coefficient of the longer range of talent. 

Any unknown term of this formula may be easily computed 
by nomograph 55 in the Handbook of Statistical Nomographs 
by Dunlap and Kurtz. 5 

The case of inter-function correlation is more complicated. 
The assumption is made that the "variance of the distribution 
of true scores in the one function from their corresponding 
true scores in the other function is the same in the shorter 
range as it is in the longer one." The following formulas are 
derived: 


VRiu - JRVR^), d 

5 ? „ \/Rirr- (Rjy/Ru,) 

E* V ru, “ (rly/ru,) 

(Formula for correcting inter-function r’s for heterogeneity) 


Since these formulas involve reliability coefficients of the meas¬ 
urements in both functions for both ranges, and information 
regarding these usually is not available, a formula using ob¬ 
tained scores rather than true scores is presented: 

ay _ Vi — Ri y (Approximate formula for correcting 
2 * 's/i - — - rly inter-function r’s for heterogeneity) ^ 


Similarly, 

gy m V 1 — R jy 

E y VT~~riy 


(1318) 


* Dunlap, J. W., and Kurrz, A. K. Handbook of Statistical Nomographs. Yonkers- 
on-the-Hudson; World Book Co., 193a, 
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where the assumption is made that the standard errors of 
estimate are the same in the shorter range as in the longer one, 
Under the condition of having an r* y between two tests in a 
shorter range of talent than the range of either test taken 
separately, the r xy must be corrected twice: once for hetero¬ 
geneity in each test. Taking first the correction for hetero¬ 
geneity in the x variable, formula 131a may be expressed as 
follows: 

cr|(i — t xy ) (Approximate correlation coeffi- 
(S,,) 2 cient when the x variable has (A) 

been corrected for heterogeneity) 

In turn, correcting R xy in formula (A) for heterogeneity in the 
y variable, we get: 

cry(i — Rly) (Approximate correlation coeffi- 

(2 y y cient when both the x and y vari- ,,,, 

ables have been corrected for J 
heterogeneity) 

Where R xy is the corrected coefficient of correlation when both 
variables are corrected for heterogeneity, and is the coeffi¬ 
cient obtained from formula (A). 

The solution of these equations is rather involved but can be 
estimated quite simply on the accompanying nomograph. The 
steps to be taken may be illustrated on the following problem. 

From the shorter range of talent, the group to which tests x 
and y both were administered, the correlation coefficient and 
the standard deviations of x and y were found to be 

r X y —“ ,6^ V X — 12.0 Vy — l8,0 

And from the longer range of talent, the whole populations on 
which either of the tests were administered, the standard devia¬ 
tions were found to be 

2* = 15.0 S y = 22,0 

The problem is to find: 

Rxy = coefficient of correlation corrected for heterogeneity in 
x, and 

Rx y = coefficient of correlation corrected for heterogeneity in 
both x and y. 

Step I, Place a straightedge on the line at the left which 
corresponds to the standard deviation obtained on the shorter 
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range of talent in the x variable (<7„ = <r* = ia) across to the 
correlation coefficient r, = r xy = .65. (The subscript, on the 
nomograph refers to "shorter” or sample distribution, although 
this standard deviation may occasionally be greater than that 
of the "longer” distribution of the whole population to which 
the test has been administered.) 

Step 2, Place a pin on the middle reference line of the nomo¬ 
graph, and pivot the straightedge so that it reads the standard 
deviation obtained from the longer distribution {si, = 2* = 15) 
of test x, and read off the corrected coefficient (r L = R xy = .80) 
from the scale at the right. This obtains R xr , the correction for 
r*,, when the x variable alone has been corrected for 
heterogeneity. 

Step 3. In turn to correct the R xy for heterogeneity in the y 
variable, place a pin on the value for the corrected coefficient 
(R* y = .80) and pivot the straightedge so that it reads the 
standard deviation of test y on the shorter range of talent 
(<r« M ffj. = 18). 

Step 4, Pivot again with a pin held on the middle reference 
line; change the left side of the straightedge to read the stand¬ 
ard deviation on the longer range of talent of test y 
(<r h « la), The result, R sy ® .87, is read from the scale 
at the right. This is the correlation coefficient between the two 
tests when both have been corrected for heterogeneity. 

Quite often on a matrix of intercorrelations such a9 would be 
obtained in the development of a test battery of aptitude tests, 
the inter-function correlation coefficients will be corrected up¬ 
ward when the correction for heterogeneity is made in one 
variable, but will be reduced when corrected again for the 
heterogeneity in the other variable. When this occurs, it will be 
caused by the standard deviation of the test group on one 
variable being larger than the standard deviation reported in 
the published norms or obtained on the larger industrial group 
to which the test has been administered. But in the cases where 
the standard deviations of the two restricted distributions are 
less than the standard deviations of the corresponding wider 
distributions, the corrected correlation coefficients always will 
be higher. 
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A SINGLE CHART FOR TETRACHORIC r 


WILLIAM LEROY JENKINS 
Lehigh University 

Tun widely-used Thursrone diagrams' for determining tetra- 
choric r are now out of print. As a substitute, a short-cut 
method has been devised which employs a single chart. 

Essentially, the chart compares the actual percentage-in- 
excess-of-chance in one cell with the percentages-in-excess-of- 
chance for r’s of .go, .80, ,70, and .60. The interpolation is made 
graphically if the r is above .60 and arithmetically if the r is 
lower. 


Method with Examples 


Example A: 273 296* 

Example B; 

80 

4 a 3 H 

*70 

20 


1. Mark the number (*) in the upper right or lower, left 
whichever number is smaller. 

2, Compute the two percentiles at which the distributions 
are cut. 


= 56.1% (above) 
~p 6 - 4 “ 3 ** 5 % Wght) 


—~~r^ = 4°-°% (below) 

. S5 . 6% m 


3. Multiply the two percentiles to obtain the chance per¬ 
centage in the marked ceil. 

17-6% 22.2% 

4. Compute the actual percentage in the marked cell, 


_ , 9 . 2% J?_ 

IOl6 ' 225 

5. Subtract result 3 from result 4 to obtain the actual per- 




1 Chesire. L., Saffit\ M, and Thurstone, L, L, Computing Diagrams for the Tetro- 
ehonc Correlation Coejpeienl, Chicago: Chicago University Press, 1933. 
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centage in excess of chance. Draw a vertical line downward 
from this value on the scale at the top of Figure I. 

n.6% _ _ 8.9% 

6. In each of the four sets of curves in Figure I: Find the 
larger cutting percentile on the ordinate scale. Move across in- 

ACTUAL PERCENTAGE IN EXCESS OF CHANCE 



terpolating between the curves to the smaller cutting per¬ 
centile. Drop a vertical to the baseline and mark this point. 
(The four points represent respectively the percentages-in-ex- 
cess-of-chance for r’s of .90, .80, .70, and .60.) 

7. Through the points marked on the four baselines, draw a 
curve. Where the curve intersects the straight line drawn in 
step 5, read ofF the tetrachoric r from the scale on the right. 

tetrachoric r = .89 
(Example A) 
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8, If the curve docs not intersect the vertical line, the tetra- 
choric r is less than .60. Make an arithmetical interpolation as 
indicated below. 


Tetrachoric r 
(Example B) 


.60 X 

9.6 


.56 


The chart in Figure I is too small for actual use, but the 
author will be glad to furnish without charge a photoprint re¬ 
production of an x 11 chart on cross-section paper. In em¬ 
pirical tests this chart appears to give the same answers as the 
Thurstone diagrams. 
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Algebra Prognosis Test , by Corydon L. Rich Designed as a help in 
forecasting a pupil’s work and as a guide for sectioning 
when numbers warrant more than one section, Range: high 
school and college. Working time: 32 minutes. Published 
by the C. A. Gregory Co. 


Aptitude Tests for Occupations by Wesley S. Roeder and Herbert B. 
Graham. There are six tests in the battery attempting to 
measure personal-social, mechanical, general sales, clerical 
routine, computational and scientific aptitudes, Range: high 
school and college students, and adults. Working time: 
I hour and 50 minutes for complete battery. Published by 
California lest Bureau. 


Children’s Apperception Test , by Leopold Beliak. A personality test 
specifically designed for use with children between three and 
ten years of age, of both sexes and of all ethnic groups, Con¬ 
sists of ten pictures of animals in various social situations. 
Price: set of pictures, manuals and 30 record analysis blanks, 
$9.00. Published by C. P. S. Company. 


Comprehensive Examination in Psychology , by M. Pullins Claytor. 
An achievement test for college students in psychology. 
Working time: 50 minutes. Published by the C. A. Gregory 
Co. 


Cooperative General Culture Test ( Forms X and Y), by Norman T. 
Blair, Jeanne M. Bradford, Mirian May Bryan, Paul J. 
Burke and Herbert Danzer, Designed to provide an indica¬ 
tion of the student’s general cultural background. The con¬ 
tent has been determined by the concensus of a number of 
scholars in various fields, Consists of six sections covering 
current social problems, history and social studies, litera¬ 
ture, science, fine arts and mathematics Range: college 
students. Working time: 180 minutes. Price: test booklets, 
per package of at, $5.90; answer sheets, per package of 25, 
fi.70. Published by Cooperative Test Division of the Educa¬ 
tional Testing Services. 

♦The testa listed bear 1949 or igjo copyright dates, The addresses of publishers 
are given at the end of section In some instances, certain details (particularly 
prices), are not included because they were not available at the time of going to 
press. 

HS 
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Cowan Adoles cent Adjustment Analyzer, hy Ed win a A, Cowan, Wil¬ 
bert J, Mueller and Edna Weathers. Intended for use as a 
screening device to discover individuals who would profit 
from referral to visiting teachers, psychiatrists, guidance 
counselors, etc. Range: junior and senior high school. Work¬ 
ing time: no time limit. Price: $2.65 per package of ac tests. 
Published by Bureau of Educational Measurements, Kansas 
State Teachers College. 


Diagnostic Tests 0/ Achievement in Music (Form A), by M. Lela 
Katick and T. I,. Torecrson. Enables the teacher to deter¬ 
mine eacli pupil’s level of mastery of the basic theory and 
skills in music and to locate the nature of the weaknesses or 
difficulties in music fundamentals for individuals as well as 
classes. Range: school music classes. Working time: ap¬ 
proximately 45 minutes. Published by California Test Bu¬ 
reau. 


Geometry Attainment Test , by R. D. Walton. An achievement test 
for students with 6 months or more of geometry. Working 
time: <>o minutes. Price: tests 5/-; per dozen; manual, i/~ 
each. Published by University of lam don Press, Ltd. 


Graded Arithmetic-Mathematics Test , by Philip K. Vernon. Con¬ 
structed, like the Stanford-Binet intelligence scale, from 
sets of short problems, one set for each year level. Scores 
are expressed in Arithmetic-Mathematics Ages from 7-21 
years. Range: ages 7 21. Working time: 20 minutes. Pub¬ 
lished by University of London Press, Ltd. 


Gui/ford-Zimmerman Temperament Survey, by J. P. Guilford and 
Wayne S. Zimmerman. Scores are obtained for the following 
areas: geneial activity, restraint, ascendance, sociability, 
emotional stability, objectivity, friendliness, thoughtfulness, 
personal relations and masculinity. Range: senior high 
school, college and adults. Working time: approximately 45 
minutes, Price: package of at reusable answer booklets, 
$3.75; answer sheets, 3peach. Published by Sheridan -Supply 
Company. 


Heston Personal Adjustment Inventory, by Joseph C, Heston, Designed 
to measure, for guidance purposes, the personal adjustment 
of the normal individual in six areas: analytical thinking, 
sociability, emotional stability, confidence, personal rela¬ 
tions, home satisfaction, Range: high school and college. 
Working time: 40 to 50 minutes (no time limit), Price: $2.25 
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per package of 25 tests. Published by Woild Book Com¬ 
pany. 


Holborn Vocabulary Test for Young Children, by A L. Watts. Con¬ 
sists of 100 questions concerning body parts, household 
articles, eating, drinking, actions with hands and fingers, 
etc., to be answered 01 ally by the child. Range: 3! years of 
age and upward. Working time: no time limit. Price: 1/-. 
Published by George G. Harrap and Company, Ltd. 


Iowa Every-Pupil Tests of Basic Skills (Form /), prepared under the 
direction of E. L. Lindquist. A battery of tests designed to 
measure certain skills involved in reading, work-study, lan¬ 
guage and arithmetic at the elementary-school level. There 
are fourteen separate tests: Reading Comprehension, Vocabu¬ 
lary, Map Reading, Use of References, Lse of Index, Use of. 
Dictionary, Graphs, Punctuation, Capitalization, Language. 
Usage, Spelling, Arithmetic Concepts, Arithmetic Processes 
and Arithmetic Reasoning. Range: grades 5-9. Working time: 
five and one-half hours for complete battery. Price: avail¬ 
able upon application. Published by Science Research As¬ 
sociates. 


Metropolitan Readiness Tests, by Gertrude H. Hildreth and Nellie L. 
Griffiths. Consists of six subtests designed to measure a 
child’s readiness to undertake the work of the first grade. 
The first four tests measure comprehension of words and 
sentences and visual perception, the fifth measures number 
knowledge and the sixth measures a combination of visual 
perception and motor control. Contains also a supplementary 
Drawing A Man test. Range: pre-first grade children. Work¬ 
ing time: approximately 60 minutes Price: $2.10 per package 
of 25 tests. Published by World Book Company. 


Murphy-Durrell Diagnostic Reading Readiness Test, by Helen A. 

' Murphy and Donald D. Durrell. Designed to furnish 
measure of three critical abilities: auditory discrimination, 
visual discrimination, learning rate. It is a test for group 
use, Range: intended for first graders. Working time: test X 
and 2 approximately 1 hour (no time limit); learning rate 
test has specific time limits. Price: $1.55 per package of 25 
test booklets; $1.25 per package of flash cards. Published 
by World Book Company. 


Musical Aptitude Test (Series A), by Harvey S. Whistler and Louis 
P. Thorpe. Designed to measure an individual’s aptitude for 
the study of music, Consists of five parts: rhythm recogni- 
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tic. 11, pitch recognition, melody recognition, pitch discrimina¬ 
tion, and advanced rhythm recognition. Range: grades 4-10' 
Working time; approximately 40 minutes. Published by 
California Test Bureau. y 


Revere Safely Test, by Revere Copper and Brass, Inc., in cooperation 
with the Psychological Evaluation and Services Center 
Syracuse University. Designed to measure knowledge of 
correct safety procedures in the industrial situation, The 
subject is required to tell whether each of 162 pictures illus¬ 
trates good or bad safety practices. Four areas of industrial 
safety are covered: general safety, pilings, carrying and 
traffic, tools and machine operation. Working time: 20 
minutes. Price: reusable test booklets, each 30?!; answer 
sheets, package of as, $1.00; scoring stencil, each io£, Pub¬ 
lished by Science Research Associates. 


Small Pails Dexterity 'Test, by John K. and Dorothea M. Crawford. 
A performance test designed to measure fine eye-hand co¬ 
ordination. Part I measures dexterity in using tweezers to 
insert small pins in close-fitting holes in a plate and to place 
small collars over protruding pins. Parc ll measures dex¬ 
terity in placing small screws in threaded holes in a plate 
and screwing them down with a screwdriver until they drop 
through the plate into a metal dish below. Working time: 
about 15 minures. Price; $25.00 complete with manual and 
spare parts. Published by Psychological Corporation, 


SRA Self-Scorer, by Maurice K. Troyer and George W. Angell. A 
new type of answer sheet designed for use with any teacher- 
constructed abjective test. Its primary function is to pro¬ 
mote student learning by immediately revealing whether 
test question has been answered correctly or incorrecdy. 
Questions must be arranged to fit one of the eight answer 
keys. Four types of answer keys are provided: 1J true-false 
(space for 300 questions); l) true-false and multiple choice 
(space for aio questions); 3) four-choice (space for 150 
questions); and 4) five-choice (space for 150 quesdotis), 
Each of these types is published in two different forms. 
Price: seif-scorer, complete, each $1.50; answer sheets, per 
package of 25, $l,co, Published by Science Research As¬ 
sociates, 


SRA Youth Inventory (Form A), by II. H, Remmers and Benjamin 
Shimberg. A check list of 298 questions that has been de¬ 
signed as a tool to help teacners, counselors and school ad¬ 
ministrators to identity quickly the problems that young 
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people say worry them most. Range: teen-age students 
Working time: approximately thirty minutes. Price: reusable 
booklet with answer pad, 48 ji; package of 25 answer pads, 
$1.75; scoring stencil, COfi. Machine scored form: reusable 
answer pad, 42.^; package of 100 answer sheets, $2.90; 
scoring stencils $2.50 Published by Science Research As¬ 
sociates. 


Social Intelligence Test, by J. A. Moss, T. Plunt and IC. A. Omwake. 
Designed to measure one’s ability to get along with others. 
Consists of five parts measuring; judgment in social situa¬ 
tions, memory for names and faces, observation of human 
behavior, interpretation of mental state from spoken or 
written words, and sense of humor. Range: for high school, 
college and industrial use. Working time: 45 minutes (two 
shorter forms of 40 minutes and 30 minutes are available). 
Price: $3.75 per package of 25 tests (regular form). Pub¬ 
lished by Center for Psychological Service, George Washing¬ 
ton University. 


State High School Testing Service for Indiana offers a list of 49 sub¬ 
ject-matter tests, intelligence scales and inventories based 
on the Indiana courses of study, approved text books and 
teaching practices. The list is as follows: Agriculture: Animal 
Husbandry, Farm Shop Tools (Forms A and B); Commercial: 
Commercial Arithmetic, Bookkeeping (first and third semes¬ 
ters), Shorthand (first and third semesters), Typewriting (first 
and third semesters) ; English: Mechanics of Written Eng¬ 
lish (grades 9-12), Tools of Written English (grades 7—8), 
Purdue Reading Test (grades 7-12); Healthy and Safety 
Education; Home Economics (high school ): Child Develop¬ 
ment, Clothing I (Forms A and B), Clothing II, Foods I, 
Foods II, Plome Care of the Sick, Housing of the Family; 
Home Economics (grades 7-8): Care and Play of Children, 
Clothing Problems, Food in the Home, Housekeeping; Lan- 

K : French Recognition Vocabulary (Forms K and L), 
(first and third semester), Spanish (first semester); 
Mathematics; Algebra (first and third semesters), Arithmetic 
Fundamentals (Forms A and B), Plane Geometry (first 
semester), Solid Geometry, Trigonometry; Mechanical 
Drawing; Science: Biology (first semester), Chemistry (first 
semester), General Sciences (first semester), Physics (first 
semester); Social Studies: Civics-Junior High School (first 
semester), Civics-Senior High School (first semester), Civics- 
Senior High School (one semester course), Economics, Amer¬ 
ican History (first semester), World Histcrv (first semester), 
Two Thousand Test Items in American H.-uoiv fncuid, 
905I); Guidance: A.C.E. Psychological I \ m, .lario-,, Otis 
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Quick-Scoring (Gamma Am), Henmon-Nelson (grades7-12) 
High School Attitude Scale (Forms A and B), Purdue 
Personal! ty Schedule, Mn turi ty Rati ng Scale, Purdue Physi¬ 
cal Science Test; Teacher Setf-Evaluaiion: A Diagnostic 
Teacher Rating Scale (grades4-8, Forms A and B), Purdue 
Rating Scale for Instruction (in lots of 500 or more). The 
prices of these tests range from to 6p, plus 1$. per copy 
For tests going out of state. F.xact prices may be obtained 
from publishers, State High School Testing Service for 
Indiana, Purdue University, Lafayette, Indiana, 


Tests for Infants 4-n JVecks Old (Test A), by A. R. Gilliland. De¬ 
signed to measure adaptation to the physical and social 
environment. Price: $2.00 per package of 25 test record 
sheets and examiner's manual. Test equipment may be ob¬ 
tained from the author at Northwestern University. Pub¬ 
lished by Ploughton Mifflin Company, 


Test of English Usage (Forms A £1? B), by Henry D. Rinsland, Ray¬ 
mond W. Pence, Betty S. Beck and Roland L. Beck. De¬ 
signed to measure the student’s ability to recognize and 
apply the basic rules of English composition. Consists of 
three parts: mechanics of writing; accurate use of words; 
building sentences and paragtaphs. Range: high school and 
college. Working time: no time limit. Published by Cali¬ 
fornia Test Bureau. 


Wechsler Intelligence Scale for Children, by David Wechsler. A psy¬ 
chodiagnostic instrument which lias grown logically out of 
the Wechsler-Bellevue intelligence scales used with adoles¬ 
cents and adults. In fact, most of the items in the W.I.S.C. 
are from Form II of the earlier scales, the main addition 
being new items at the easier end of each test to permit 
examination of children as young as five years of age. Range: 
primarily for use with the school-age child. Working time: 
45 minutes to I hour. Price: $19.50, including; manual and 
25 record forms. Published by the Psychological Corpora¬ 
tion, 
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THE THEORY AND CLASSIFICATION OF 
CRITERION BIAS 


HUBERT E. BROGDEN 
and 

ERWIN K. TAYLOR 
Personnel Research Section, AGO 1 

Introduction 

In that area of psychology concerned with the development 
of tests and other predictive instruments, psychologists have 
continually emphasized the need for validation. This insistence 
is sufficiently pronounced to serve as a trade mark of profes¬ 
sional psychologists. It is consistent with this insistence upon 
validation, that the importance of the criterion problem has 
been widely recognized. This is particularly true of the many 
psychologists connected with the various testing programs con¬ 
ducted during World War II. However, little attention and less 
effort have been devoted to a systematic consideration of the 
problems involved in criterion construction. Publications by 
Bellows (i), Stuit (13), Toops (15), Yiteles (18) and Guilford 
(9) are among the few dealing particularly with these problems. 
Any systematic consideration of the problems involved in 
criterion construction inevitably leads to the problem of bias; 
to a consideration of the ways in which components which 
should properly be a part of the criterion are omitted; to the 
ways in which extraneous components are introduced; and to 
how distortion of weighting or of scale units occurs. 
yjThis paper will attempt a systematic consideration of these 
problems. A classification of bias will be introduced and related 
to the steps involved in criterion construction. The more specific 
problems of bias encountered will then be discussed in relation 
to this classification system and in relation to the various types 
of criteria (i.e., production records, ratings, achievement tests, 
etc.). 


1 The opinions expressed are those of the authors and do not necessarily express the 
official views of the Department of the Army 
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Before proceeding further we should like to discuss two points 
important to the authors’ general orientation in attacking cri¬ 
terion problems. The essence of the hist point in question may 
be stated as follows: In seeking t:o define criterion problems— 
particularly those of criterion bias—it must be recognized that 
the objective of criterion construction is subsidiary to that of 
selecting the most efficient battery of predictors. Prediction 
instruments are validated for the purpose of picking the best 
selection battery, assigning appropriate weight to each of its 
several components, and determining the effectiveness of the 
battery. The criterion achieves its sole function if it makes these 
objectives of validation possible. In the development of an in¬ 
dustrial selection program, for example, the criterion should 
give an accurate and unbiased measure of the extent to which 
individuals in the validation population contribute to or detract 
from the efficiency of the organization This may be taken as 
axiomatic. If so, the emphasis in criterion construction must be 
in terms of the objectives of the prediction problem. 

Criteria differ from predictors in that the former must be 
tested in terms of a concept that we carefully avoid in the latter, 
In constructing or choosing from among existing predictors, an 
empirical approach can be, and often is, profitably used. Re¬ 
course to previous research results, information based on job 
analysis, hunches, hypotheses, and intelligent guesses all pro¬ 
vided legitimate bases upon which to predicate a potential 
selection battery. Wrong guesses can be costly in terms of 
wasted research resources, but they are not misleading since 
they are put to the empirical test of how well each accomplishes 
the objectives of the prediction task, i.e., how well each corre¬ 
lates with the criterion. 

The criterion, by contrast, can be subjected to no wholly 
satisfactory empirical test of its adequacy. The criterion must, 
consequently, be logically justifiable as valid in its own right. 
The remainder of this paper is predicated on the acceptance of 
this point of view. Invalid and biased criteria, again in contrast 
to predictors, cannot be eliminated through empirical demon¬ 
stration of their inadequacy. Thus, the faulty criterion not only 
wastes research efforts, but seriously reduces the effectiveness 
of the final outcome of the program, 
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For the purpose of this discussion, a biasing factor may be 
defined as any variable, except errors of measurement and 
sampling error, producing a deviation of obtained criterion 
scores from a hypothetical "true” criterion score. It is apparent 
that this definition is quite general and leads to the considera¬ 
tion of all factors which bear upon the desirability or undesir¬ 
ability of criterion elements and their combination. Of course, 
the practical consideration which faces the research worker in a 
“real” situation precludes the complete elimination of all 
undesirable aspects of criterion construction. Perfection may 
be approached—it is not likely to be achieved. Nonetheless, 
to improve his criteria to the point optimal for the conditions 
undei which he is working, the research psychologist must know 
the importance of different types of bias, the manner in which 
each will probably affect his results, the proper emphasis to 
place upon the elimination of those factors producing a distor¬ 
tion of results of indeterminate magnitude, and, finally, the 
probable effect of bias that cannot be entirely eliminated. It 
will be shown that different types of biasing factors vary widely 
in their distortive effect, generally as a function of the degree of 
their correlation with the members of the predictive battery. 
Some biasing factors influence the validity coefficients but have 
little or no effect on estimates of criterion reliability. Others 
affect both. Still others may alter the apparent reliability of the 
criterion without seriously influencing the validity. 

Classification of Biasing Factors 

Imperfections or bias in the criteria may be classified as. 

(i) Criterion Deficiency —omission of pertinent elements from 
the criterion. 

(1 ) Criterion Contamination —introducing extraneous ele¬ 
ments into the criterion 

(3) Criterion Scale Unit Bias —inequality of scale units in 
the criterion. 

(4) Criterion Distortion —improper weighting in combining 
criterion elements 

The above classification of criterion bias is functional in terms 
of the steps the authors consider essential to adequate criterion 
construction, These steps may be indicated as follows: 
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(i) Careful analysis of the total situation in which the cri¬ 
terion behavior occurs for the purpose of isolating all 
sub-criterion variables and obtaining preliminary esti¬ 
mates of their relative importance--the determination 
of what is to be measured. 

(a) The construction of procedures and/or scales for the 
measurement of these elements- determination of how 
each element is to be measured, 

(3) Development of a procedure for combining these ele¬ 
ments into the desired single composite—determination 
of the relative importance of each element to over-all 
efficiency. 

Criterion deficiency is most apt to occur in the process of de¬ 
termining the variables to be included in the criterion. Con¬ 
tamination and criterion scale-unit bias are most likely to appear 
in the process of constructing scales for the measurement of the 
sub-criterion elements while criterion distortion results primarily 
from faulty methods of combining the criterion elements. 

Each of the three steps of criterion construction is necessarily 
involved, however sketchily, in the development of any cri¬ 
terion. The rationale of our classification of bias is so intimately 
related to the belief in the need for an explicit plan of construc¬ 
tion involving these three steps as to justify further clarification 
of the implications of each in its relation to bias. 

The desirability of establishing the variables important to 
“success” by observation and job analysis (step 1) before pro¬ 
ceeding to scale construction (step 2) and the combination of 
sub-criterion variables (step 3) deserves special emphasis. From 
reports of validation studies found in the literature, it may be 
judged that the usual first step in criterion development is the 
search for available criterion measures, The psychologist em¬ 
ploying this procedure very often arrives at a decision as to 
criterion content that is undesirably influenced by factors of 
availability. The discovery of several already available or 
readily obtained measures that are apparently suitable is in¬ 
clined to lead to neglect of the systematic observation and 
analysis necessary to insure that all important aspects of on-the- 
job productivity have been identified, In choosing criteria on 
the basis of availability, method of measurement as well as 
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nature of variables usually is also a function of convenience 
rather than of desirability, Without accomplishing step 1 before 
deciding upon the means by which the criterion variables are 
to be measured, a systematic consideration of alternate methods 
of scale construction or measurement and choice of the optimal 
method for each criterion variable is not likely to be made. 
While it is recognized that, in many cases, the final decision as 
to the method of measurement will have to be made in the light 
of economy and available research resources, it is the firm belief 
of the authors that there is generally enough freedom of choice 
within the limitations imposed by even a policy of strict expe¬ 
diency, to justify the type of analysis proposed. At least the 
decision can be made with full and explicit recognition of the 
basis for making it. It might be added, parenthetically, the 
careful accomplishment of step 1, in addition to insuring that 
adequacy of criterion variables, frequently serves the additional 
function of supplying valuable clues as to possible predictors. 
Savings realized through this means may in part, if not entirely, 
offset the extra cost and effort required to make a thorough 
observation and analysis. 

Criterion Bias and Predictor Correlation 

To this point, our classification and discussion of bias have 
been in terms of the criterion alone. Since effort expended in 
constructing a bias-free criterion is, as we have stressed before, 
directed ultimately toward the proper choice and weighting of a 
battery of predictors, it is essential to consider the effect of 
criterion bias on the degree to which this objective is realized. 

Biasing factors correlating with the predictors will obviously 
distort the validities and the partial regression weights of the 
various predictors. They may even result in the inclusion of 
tests in the battery that predict only bias and have no relation¬ 
ship to the "true” criterion. The introduction of bias having no 
relation to the predictors is, on the other hand, equivalent, in 
effect, to an increase in the error of measurement of the cri¬ 
terion, The relationship of all predictors to the criterion will be 
attenuated. But this attenuation will be proportional for all 
predictors. Consequently, the relative magnitude of the validi¬ 
ties and the partial regression, coefficients will be unaffected* 
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This leads to the highly important conclusion that the "true” 
validity of the weighted composite resulting from the validation 
study remains substantially unaffected by test-free bias, even 
though the exact magnitude of this validity cannot be esti¬ 
mated. With these considerations in mind, we may further 
classify biasing factors into those which are predictor correlated 
and those which are predictor free. 

The authors do not wish to imply that the attenuating effect 
of test-free bias is of little import. In addition to the attenuation 
of the validity coefficients and partial regression weights, two 
other undesirable results will accrue from the introduction of 
test-free bias into the criterion: (1) The sampling error of the 
validity and regression weights will tend to increase, thus 
rendering these statistics less stable from sample to sample, and 
(a) biasing factors that are test free may, none the less, distort 
estimates of the reliability of the criterion in an indeterminate 
manner. 

The first of these faults may he overcome by increasing the 
size of the experimental population if additional cases are avail¬ 
able with, of course, a resulting increase in the cost of the re¬ 
search. The problem of correcting for the unknown effect of 
test-free bias on criterion reliability is more difficult, and 
possible solutions are usually less satisfactory. Such possible 
solutions are, in any event, particular to the nature of the bias¬ 
ing factors. 

In spite of these adverse effects of test-free bias, it is believed 
that, effectively, it is the presence or absence of test-correlated 
bias that “makes” or “breaks” the criterion. 

Criterion Deficiency 

Before beginning our discussion of criterion deficiency, a dis¬ 
tinction should be made between criteria designed to measure 
over-all proficiency on a particular job and those concerned 
with success in specific job elements. The validation problems 
involved are both legitimate. In the latter case, it may be 
desired to measure success in a job element common to a wide 
variety of job classifications in order to validate a test designed 
specifically to predict this element. Adequate validation sam¬ 
ples can sometimes be obtained only by combining groups from 
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a wide variety of jobs, all of which share the concerned element 
The problem of criterion deficiency would not usually be perti¬ 
nent to validation studies of this nature. Our concern, in any 
event, will be exclusively with criterion deficiency as it occurs 
in criteria of general on-the-job success 

Criterion deficiency is present to a greater or less degree in all 
studies involving criteria of general success. While it is doubtful 
that a criterion could be built which would take into account 
all aspects of on-the-job performance, it is the authors J opinion 
that the high incidence of deficiency may be avoided by a more 
systematic approach to the problem of determining criterion 
elements. In the light of our earlier discussion of the relation¬ 
ship between biasing factors and the steps essential to criterion 
construction, it is apparent that it is in step 1—the analysis of 
the situation in which the criterion behavior occurs—that cri¬ 
terion deficiency is most likely to materialize. Adaptation of the 
principles of worker analysis can probably be made so as to 
minimize criterion deficiency in prediction problems. 

The systematic investigation of the situation in which the 
criterion behavior occurs serves several valuable functions. 
First, it minimizes the possibility of overlooking important 
criterion elements. Second, it supplies the investigator with 
valuable clues as to the most practical means of measuring the 
several criterion elements. Third, the analysis supplies some 
initial estimates of the relative importance of the several cri¬ 
terion elements. Thus, if available facilities require limitation 
of the criterion to a bare minimum, an intelligent judgment may 
be made as to which elements may be omitted from the study 
with least harm. Finally, an analysis of the criterion situation in 
advance of any other steps in the study will generally shed con¬ 
siderable light on the nature of the predictors most likely to be 
valid. This can eliminate considerable loss of valuable testing 
time and may result in batteries of greater validity than would 
usually be the case with predictors chosen on a less sound basis. 

The “critical incident” technique for the construction of 
rating scales as expounded by Flanagan (7) appears to offer 
promise as a means of reducing criterion deficiency in rating. 
Not enough is yet known concerning the use of the method to 
permit a considered judgment of its value for this purpose. 
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One factor frequently making for criterion deficiency is the 
inclination of investigators to employ only one type of criterion 
measure. Studies using ratings, usually use only ratings; those 
in which production records are used, use only production 
records; where job samples are employed, neither ratings nor 
production records are likely to enter into the picture. If an 
adequate analysis of the job situation were accomplished and a 
decision as to criterion content were made before consideration 
is given to the most desirable measuring techniques for each 
job element, it would seem that production records would often 
be found most desirable for some of the criterion elements and 
ratings or job samples most desirable for other elements. 

Composite criteria consisting of a variety of production in¬ 
dexes seem, in practice, to be most frequently and most obvi¬ 
ously subject to criterion deficiency. The difficulties involved 
in devising and putting into operation the procedures necessary 
to obtain production records for those job elements for which 
none already exist often constitute the determining factor in 
such instances of criterion deficiency. A systematic approach 
to criterion construction will do much to minimize such bias. 
If the important job elements influencing over-all efficiency are 
isolated first of all, gaps in the total job picture become more 
readily apparent and measures may be obtained of those ele¬ 
ments necessary to complete the criterion composite in the 
manner that is most practical in. the particular situation. If it is 
found at that time that production records cannot be made 
available for the measurement of all criterion elements; ratings, 
job samples, or other means may be devised to eliminate the 
gaps in the composite. 

In considering criterion deficiency in relation to rating 
criteria, we must distinguish between over-all ratings and com¬ 
posites derived from separate evaluations for each element. In 
the latter case, there is the same need for systematic analysis of 
the job situation for the determination of the elements to be 
evaluated as in the construction of production record or mixed 
criteria. Generally, rating criteria, whether as separate element 
ratings or as over-all, undertake to account for a larger part of 
the total job than is the case with production criteria. Thus, 
criterion deficiency is probably somewhat less prominent in 
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ratings than in ordinary production record criteria. Bias un¬ 
doubtedly does occur because of improper weighting. It should 
be pointed out that so little weight is given to some factors that 
the criterion distortion introduced practically amounts to 
criterion deficiency. 

It should be recognized that in the use of over-all ratings of 
effectiveness, the problem of criterion deficiency has not been 
solved. Rather, it has been placed in the laps of the raters. The 
extent to which such rating will be deficient depends, of course, 
upon the extent to which each of the raters has included each 
of the important elements of success in making his rating. It 
may be expected that different raters will incorporate different 
elements into their composites and that, in effect, there will 
be a different amount and kind of criterion deficiency in the 
estimates obtained from different raters, if not in different rat¬ 
ings made by a single rater. When limitations of the research 
study require the use of over-all ratings as the criterion, it 
would seem advisable to incorporate a careful definition of the 
important job elements in the directions for the execution of the 
ratings. This, if properly accomplished, should help to reduce the 
extent of criterion bias and to insure that the evaluations of the 
several raters are predicated on a more uniform constellation of 
elements than would otherwise be the case. 

The foregoing comments appear to provide sufficient con¬ 
sideration of criterion deficiency in relation to ratings. Because 
of the effect of halo (discussed below), it is difficult to consider 
this problem intelligently. Ratings of different job elements are 
often found to be so highly interrelated that one suspects that 
the rater's impression of the ratee’s competence is the only 
determining factor of general importance. Because of this 
effect, the authors do not wish to give the impression that ad¬ 
herence to the foregoing suggestions will produce substantial 
improvement in the results obtained. 

A11 examination of research reports indicates that, in general, 
systematic job analysis is an initial step in the construction of 
job-sample criteria more often than in the construction of any 
other type of criterion measures. In spite of such systematic 
job analysis, it is the authors' opinion that important elements 
of on-the-job success are usually omitted from job-sample 
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criteria. Much of the difficulty in this respect arises because the 
job-sample criterion indicates how well the employee can 
perform under standard conditions rather than how well he 
does perform under normal work-a-day conditions. It could 
possibly be argued that for the validation of aptitude and 
achievement tests, as opposed to personality measures, this is 
precisely what is desired. Where on-the-job success is a function 
of personality variates, however, job-sample criteria are apt to 
be criterion deficient. As a result of the exclusive use of such 
criteria, truly valid measures of personality differences would 
be excluded from the battery selected for operating vise. Thus, 
while the use of job-sample criteria may be recommended for the 
evaluation of production in certain types of situations, it is 
doubted that they should ever he used alone as a measure of 
over-all on-the-job success. 

Criterion Contamination 

Criterion construction based on arm-chair considerations o r 
factors of availability, rather than on an analysis of the job 
situation, faces not only the danger of omitting important fac¬ 
tors but also that of incorporating variables that are not meas¬ 
ures of on-the-job success. While contaminants of the criterion 
occur in the process of deciding what to measure, it is in the 
construction of the actual scales, or other means of measure¬ 
ment, that the investigator most frequently faces the problem 
of contamination. 

From our outline of the steps in criterion construction it will 
be noted that procedures and/or instruments for making such 
measurements must be devised as a second step following the 
determination of the job elements in need of measurements. 
In discussing contamination in relation to the major types of 
criterion measures, the broader meaning of the term as em¬ 
ployed here should be borne in mind. The more conventional 
usage of the term limits it to contamination introduced by di¬ 
rect influence of predictor scores on the criterion, The basic 
example is the effect of knowledge of predictor scores on cri¬ 
terion ratings. Bellows (i) extended the meaning of the term to 
include such phenomena as opportunity bias and artificial 
restriction of production, In the present paper, as has previously 
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been noted, any source of variance in the criterion, other than 
error of measurement that is not a reflection of on-the-job suc¬ 
cess, is labelled "criterion contamination ” Thus, our definition 
includes all extraneous elements in the criterion However, 
several additional concepts will be introduced to aid in dis¬ 
tinguishing between different types of contamination. 

In production records, contamination most frequently oc¬ 
curs because factors beyond the control of individual workers 
considerably affect the amount of his production This type of 
contamination has been referred to as opportunity has. 

Examples of opportunity bias may be cited for almost any 
type of job. In evaluating salesmen such bias may occur be¬ 
cause of differences in the "goodness” of territory; in evaluating 
production line workers, it may occur because of differences 
between day and night shifts, in the location of the work site, in 
tools and machines, in the efficiency of supervisors, or in work¬ 
mates and repairmen. Differences between day- and night- 
shift workers may be substantial even though no differences 
exist as to potential productivity If samples of production are 
obtained at different times for different workers, diurnal varia¬ 
tions in productivity may bias the obtained criterion scores. 
Thus, it is known that work output definitely varies according 
to the time of day Hence, records of production obtained on 
individuals at the time of optimal output would be biased in 
relation to those obtained at the time of minimal output. A 
comprehensive listing of the sources for opportunity bias is 
impossible. A careful analysis of the conditions of work of the 
various members of the experimental group during the collec¬ 
tion of criterion measures is necessary to insure identification of 
such biasing factors 

The most important question to be answered with reference 
to opportunity bias is the degree to which it is test correlated or 
test free. First of all, the possibility that components of the 
experimental predictor battery were employed in determining 
who would be placed in the position where opportunity for 
high production record was greatest, should be checked. For 
example, if tests or other variables in the predictor battery were 
employed to determine which salesman obtained the best terri¬ 
tory or which sales clerk was given the best counter, etc.,—as 
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might often occur if the selection procedures being validated 
were actually used in the operating selection program—the 
effect on validity of such biasing factors could be very con¬ 
siderable. 

Suppose that 10 per cent of the variation in amount of pro¬ 
duction in a given job were due to some form of opportunity 
bias, and that placement in a position of greater opportunity 
had been in terms of a test employed for the prediction prior to 
the initiation of a research study. If, in this research study, this 
predictor was evaluated along with other experimentally con¬ 
structed instruments, we can compute that the obtained va¬ 
lidity (biased by the opportunity factor) would be .32, even 
though its actual validity were zero. It may be seen that the 
resulting contamination would be highly destructive to the 
objectives of the research study. 

Even if no direct evidence of relationship is found between 
any predictor and opportunity bias in criterion scores, evidence 
of indirect relations should be sought. If seniority were to 
determine placement in the position of greatest opportunity, 
predictor variables such as age and experience would show 
heavily biased validity. Personal history items, bearing directly 
or indirectly on the length of experience or age, would have 
similarly biased validities. Other possibilities may be cited. 
Questionnaire items relating to marital status may appear to 
have high validity because a much higher percentage of non- 
married workers choose to work on the night shift, A measure of 
aggressiveness may falsely appear to be a valid predictor of 
sales records because the more aggressive salesman pushes him¬ 
self into the advantageous sales-territories. 

While the possibility that opportunity bias may be test- 
correlated should be thoroughly checked, it is probably gen¬ 
erally true that the extent of the correlation will frequently be 
found to be negligible. Generally, in other words, opportunity 
bias will be test free and will attenuate or lower all validity 
coefficients but will not seriously distort their relative mag¬ 
nitude. 

A second frequently mentioned contaminating factor in pro¬ 
duction records is the one introduced by limitations on rate of 
production. Such limitations may occur because of assembly line 
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production, because men work in teams, because of social pres¬ 
sure from other workers or from a number of similarly operating 
factors. These are not biasing in one sense of the term. If pro¬ 
duction of the faster workers cannot exceed that of the slower 
workers by more than 50 per cent, the observed difference is 
truly representative of the full advantage to be obtained by 
hiring the fastest in preference to the slowest worker for that 
given job situation. Of course, if the effect of a change in the 
composition of the efficiency of all members of the assembly 
line—or group—could be measured, the problem would be con¬ 
siderably changed. In order to obtain a measure of such effects 
it would be necessary to depart from the usual correlational 
methods of validating tests. It would be necessary to select 
groups with differing average productivity, to assign all mem¬ 
bers of each group to a given assembly line and to compare the 
mean productivity of these groups. In such comparison of 
groups* experimental controls would have to be established; 
that is, the conditions of work, and all factors influencing out¬ 
put, would need to be equalized for all groups, with variation 
between groups limited to the difference in predicted produc¬ 
tivity. While the method for handling this special problem 
bears mention, extended discussion is not possible at this point. 

While the effect of such factors is not contaminating in the 
sense indicated above, results due to the presence of such factors 
cannot, of course, be generalized to situations where such limi¬ 
tations are not present. The most obvious conclusion to be 
drawn when limitation on production is discovered, is that 
selection programs are likely to be of limited value. 

To save time and money, a large proportion of the industrial 
selection researches are conducted on in-service personnel, i e., 
tests are administered to and criterion data are collected on 
personnel already in the employ of the sponsor. Such cross- 
sectional studies, while nesessary, are always, to some degree, 
defective in experimental design. In practice, test scores are 
necessarily obtained prior to employment or to any other per¬ 
sonnel action based on them. The conduct of cross-sectional 
studies may introduce two types of contamination when its 
results are applied to the employment situation The first is a 
test contamination arising from the fact that both the on-the- 
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job experience, and the nature of the conditions under which 
the predictors are administered, may exercise considerable 
influence on test scores. This, being a predictor rather than 
criterion contamination, need not further concern us here. 

The collection of criterion data on an in-service population 
may, however, introduce an experience-contamination which is 
of direct concern to us. Where the job is one in which produc¬ 
tion may be expected to rise with increased experience and there 
is considerable variability in the tenure of the validation popu¬ 
lation, the criterion will, of course, he contaminated with 
experience. If the predictors also include experience-correlated 
variables such as age, the contamination will be predictor corre¬ 
lated. If the tests are also experience-contaminated, such tests 
will show a spuriously high correlation with the criterion. 

Validities of predictors such as information or proficiency 
tests, and knowledge of terminology, would tend to show posi¬ 
tive bias in validity in cross-sectional validation studies 
Knowledge of terminology and productivity would both tend 
to be greater in experienced than in inexperienced workers 
even though there might be no relation between the two 
measures among workers with equal experience. Bias of this 
nature may be avoided by testing prior to employment, by 
administering all tests to groups with constant amounts of 
experience, or by controlling experience statistically. The dan¬ 
ger of such bias does not have bearing, obviously, on the 
utilization of experience prior to employment for the given job 
as a predictor. 

Estimates of the reliability of production criteria are prob¬ 
ably more often, and more seriously, distorted by biasing fac¬ 
tors than are validities. Bellows (x) has pointed out that in 
many jobs where unequal opportunity seriously affects the 
production records of a category of workers, it is likely that a 
second measurement of the productivity of these workers will 
be obtained with the same biasing factors in operation and with 
the same workers showing spuriously high productivity. For 
example, if production records were obtained during two differ¬ 
ent intervals on a population of workers including those on day 
and night shifts, it would probably be found that day-shift 
workers produced more during both time intervals. The appar- 
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ent reliability of the production measure would be quite high 
even though its actual reliability were below usual standards of 
acceptability. 

The construction of rating scales free of contamination 
presents, possibly, more serious problems than detection and 
elimination of contamination from production records. It 
should be stressed initially that all of the sources of bias dis¬ 
cussed in connection with production records will probably also 
tend to influence ratings of productivity. It is possible, however, 
that raters may be successful in making allowance for some of 
these factors—opportunity biases, for example—and thus 
reduce their influence. 

The most obvious and probably the most serious source of 
contamination peculiar to ratings arises because of the so-called 
halo effect. 

The term "halo” implies that a spurious relationship between 
rated traits, attributed to a spread of the effect of the raters’ 
attitude toward, or estimate of, the rater in one dimension 
over to his attitude toward or estimate of the rater in other, 
unrelated, dimensions. Various factors have been postulated as 
the source of the halo effect. Degree of personal liking is fre¬ 
quently mentioned as a possible source. Over-all impression, 
social prestige and outstanding achievement in a particular 
field are other possible sources. As yet there is no evidence allow¬ 
ing definite conclusions regarding the source of the halo effect 
It may be regarded as established, however, that some factor 
or factors operate spuriously to increase the relationship be¬ 
tween ratings on different characteristics. 

Since the source of halo cannot be established, it cannot be 
regarded as necessarily a contaminating factor. Bingham (2), 
in discussing the role of halo in criterion ratings, expresses the 
belief that there are a number of situations in which the general 
impression that the individual makes on those he comes into 
contact with, can itself be an important criterion element. He 
concludes that halo should not, in all cases, be considered an 
undesirable attribute of criterion ratings. 

It would be agreed by most, however, that Bingham’s con¬ 
clusion, even if correct, gives no sound solution to the problem 
of halo in criterion ratings. Even though halo reflects important 
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elements of on-the-job proficiency, it would be desirable to ob¬ 
tain adequate estimates of proficiency in the various aspects of 
the job, free of halo, in order to insure that separate job ele¬ 
ments are properly weighted in arriving at an over-all composite. 

Halo effect, if contaminating in nature, can become test 
correlated and thus assume considerable importance, particu¬ 
larly when the prediction battery includes ratings, personality 
measures and ability tests. In such a situation, the criterion 
and predictor ratings may show spuriously high correlation 
because of halo effect common to both. Personality measures 
may likewise show spuriously high validities through the pre¬ 
diction of the contaminating halo element. Since, at the same 
time, validities of ability-test scores would probably be attenu¬ 
ated, the partial regression weights for the entire battery would 
be considerably distorted. The tendency reported by Bingham 
and Preyd (3) for personality measures to show relatively higher 
validities against rating criteria and for objective tests to show 
relatively higher validities against production record criteria, 
may be explained, at least in part, by the biasing effect on the 
validities of personality measures noted above. It is particu¬ 
larly important to note that direct criterion contamination 
may result from a remote source. Variables which influence 
criterion ratings need not be members of the prediction battery 
in order to distort the validities and regression weights. If the 
variables which influence the criterion scores are correlated 
with any in the battery, the resultant criterion contamination 
will be test correlated; if such variables are uncorrelated with 
members of the predictor battery, the contamination will be 
predictor free. 

A source of criterion contamination in ratings similar in its 
effect to opportunity bias arises from differences in the mean 
values obtained from different raters. Employees of a tough 
rater will receive lower criterion scores than will those of an 
easy rater. Normally, the resulting contamination will be test 
free. However, if assignment to various supervisors is made on 
the basis of test scores, such bias can be predictor correlated. 
Conrad (4) has contended that such differences in rater tend¬ 
ency are over-emphasized and that proper rating techniques 
will minimize such differences. 
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A basic source of contamination in ratings arises from the 
failure of raters or of rating-scale constructors to distinguish 
between those observations which constitute direct evidence 
of productivity and those which give only inferential evidence 
of productivity. To this source of criterion contamination the 
authors would like to give the name “ error of illation Thus, 
ratings on the efficiency of a caipenter based on observations 
on the skill with which he uses his hands, the air of assurance 
with which he handles tools or even the correctness of his choice 
of tools for each operation, are all inferential and without em¬ 
pirical evidence cannot be assumed to have high relationship to 
actual productivity Even though such relationship were estab¬ 
lished, it could not be assumed that such trait ratings could be 
substituted for direct measures of productivity without biasing 
effect on the validation results. 

In designing forms that incorporate scales for the measure¬ 
ment of such traits as manual skill, industriousness and ambi¬ 
tion, the psychologist promotes this form of contamination. 
Ratings on such traits give rise to the danger that resulting 
evaluations may not only have been inferred, but that they 
may have been inferred, in large part, from events observed in 
a social situation or in other situations having no necessary 
relation to on-the-job productivity. Evidence on the highly 
specific nature of psychological traits from studies by Hart- 
shorne and May (10) are pertinent in showing the dangers of 
such bias. 

The tendency of rater to consider the symptoms of produc¬ 
tivity rather than productivity itself can probably never be 
entirely eliminated. It should be possible in many work situa¬ 
tions, however, to identify the individuals with the greatest 
opportunity to observe the actual production element and to 
orient the scales so that the evaluations given by the rater 
involve as few deductions and as much direct observation as 
possible. The directions and content of the scales can be so 
oriented that they specifically request the rater to base his 
evaluation on direct observation of results. Even though it is 
improbable that the desired purpose will be entirely accom¬ 
plished, the technician should at least not be guilty of encourag¬ 
ing a tendency toward inference rather than direct observation. 
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by phrasing his directions and scales in terms of indirect or 
inferential content. 

Having constructed scales oriented toward direct evidence 
of productivity and having determined who is in the best posi¬ 
tion to evaluate each directly, the technician may take one 
additional step to help reduce errors of illation. The raters 
may be instructed, well in advance of the collection of the cri¬ 
terion ratings to observe and to take note of behaviors falling 
in the areas to be rated. Mention should be made of the fact 
that such oriented observation is an integral part of the “criti¬ 
cal incident” technique mentioned above. 

The bias introduced by the illation error is probably very 
often test correlated. Trait ratings obtained prior to employ¬ 
ment might give excellent prediction of ratings of traits thought 
desirable for efficient performance but not actually related to 
quantity and quality of production. If so, nothing will have been 
demonstrated. Personality tests related to the traits thought 
desirable would similarly yield inflated validity coefficients. 

The danger of contamination in use of achievement-test 
scores, as criteria of success in training or in school, are 
considerable. Probably, also, such contamination will be test 
correlated. Frequently, information tests are employed along 
with aptitude and other measures to predict achievement in 
training. Such achievement is also measured by an informa¬ 
tion test administered at the end of training. Test constructors 
working on both the predictor- and criterion-information tests 
may well employ the same source material for constructing 
items and may well both err in the same direction in selecting 
items irrelevant to or unimportant in the actual training proc¬ 
ess. Such common but irrelevant content in the predictors and 
criterion can naturally be expected to produce test-correlated 
contamination. 

It is probable that a similar biasing effect is often obtained 
in relating any ability-test measures to success in training. 
Generally speaking, ability-test scores have shown uniformly 
high validities in this area. Such validities are suspect, however, 
since they are obtained by relating initial test scores to measures 
of proficiency after training. Woodrow (17) has shown that 
initial test scores show little relation to improvement with 
practice. He has also shown that general-intelligence-test scores 
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(often interpreted as measures of learning ability) have little 
if any relation to improvement in scholastic achievement. Since 
the essential problem is the prediction of benefit derived from 
training, lack of evidence contradictory to that reported by 
Woodrow suggests that predictors of training success or train¬ 
ing improvement have doubtful validity for that purpose. The 
selection instruments may, of course, still have value in pre¬ 
dicting on-the-job success. To assume such validity, knowing 
only that the predictors correlate with estimates of achievement 
in training, assumes that achievement in training is highly 
related to on-the-job success Little, if any, research has been 
reported demonstrating a positive relationship between training 
success and later success on-the-job. The low correlation of the 
academic achievement of West Point Cadets (8) with later 
success as Army officers, argues strongly that training success 
cannot be assumed to have appreciable relationship to success 
on-the-job. 

Job-sample criteria are possibly less subject to contamination 
than any of the criteria discussed Opportunity can be carefully 
controlled. Halo, effect of easy-hard raters, etc., can be reduced 
to a minimum. Ratings of work products, while subjective, 
differ in character from ratings of individuals. In rating work 
products, raters need not know the individuals whose products 
are being evaluated and the effect of personal likes and dislikes 
of the rater can thus be eliminated. 

However, because of the similarity between the test-like 
character of the situation under which the job-sample meas¬ 
ures are obtained, and the usual conditions under which tests 
in a predictor battery are administered, contamination, test- 
correlated in nature, is probably often present in job-sample 
criteria Individuals who become overexcited or nervous in the 
one situation may tend to show the same type of behavior in the 
second. Similarly, individuals who put forth greater effort when 
being watched, would be apt to do so in the type of situation in 
which both tests are administered and job-sample measures are 
obtained. It is possible, also, that if tests and job-sample per¬ 
formances are obtained on the same day, factors peculiar to the 
day of testing will act as test-correlated contamination and 
introduce a positive bias into the validity coefficients. 

A type of criterion scale in which the possibility of contamina- 
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tion is easily overlooked is that in which "high” and "low” 
groups are employed. For some not-too-apparent reason, in¬ 
vestigators seem to feel that by selecting extreme groups they 
have circumvented the problem of contamination and have 
secured "pure” cases. It is strongly emphasized that the selec¬ 
tion of such groups is based on a continuum either actual or 
implicit. Extremes on this continuum may be extreme on a 
contaminating element as well as on the “true score” com¬ 
ponent of this continuum. Such measures should he as carefully 
scrutinized for contamination as any continuous criterion. 

Investigators often show a similar tendency to neglect prob¬ 
lems of bias in using a group-membership criterion. In the 
situation in which members of one occupation are compared 
with members of other occupational groups, or with the 
general population, the opportunity for criterion contamination 
is extensive. Where the “in” group has been test selected and 
the same or correlated predictors are included in the experi¬ 
mental battery, presenceof extensive test-correlated contamina¬ 
tion is almost certain. 

Even where prejudices, rather than tests, dictated entrance 
into the occupational group, predictor-correlated contamination 
may be expected. If an executive, for example, arbitrarily ruled 
that all messengers coming into the firm should be high-school 
graduates, and employed messengers were compared with some 
general group, education and educational achievement tests 
would show substantial validity even though their true validity 
were negligible. 

The composition of occupational groups is determined by 
factors determining the initial choice of occupation and by 
attrition after such initial choice. Factors responsible for choice 
of occupation are almost certainly a source of contamination; 
those responsible for attrition may be of value for criterion 
purposes. It is not usually possible to obtain any reasonably 
exact information concerning the major factors in either case. 
Because of this lack of information, if for no other reason, such 
a criterion is suspect. 

The preceding discussion of contamination could not be 
completely comprehensive. In any individual research study, 
contamination peculiar to that study may be discovered. We 
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have endeavored, however, to clarify and illustrate the nature 
and effect of the more important general factors. 

Criterion Scale Unit Bias 

While the presence of scale-unit bias in criteria has fre¬ 
quently been recognized, particularly in connection with rat¬ 
ings, tlie general problem has not been extensively explored. 
A review of the psychological literature provides little evidence 
allowing an estimate of the prevalence or seriousness of scale- 
unit bias in the criteria of validation studies. 

Basically, it is believed that the problem centers in the ab¬ 
sence of an adequate rationale. There is no generally accepted 
means of judging the presence or absence of scale-unit bias 
available to the investigator desirous of evaluating the relative 
merits of various possible types of scale units or scaling pro¬ 
cedures. 

Possibly the only widely used standard of adequacy of scale 
units is the degree of approximation of the obtained frequency 
distribution to a normal curve. While this standard may be of 
some value in avoiding serious distortion of scale units, it 
must be remembered that normality is always an assumption. 
Standards are needed that will allow checking the adequacy of 
scale units in a particular example without the necessity of such 
an assumption. From a logical view point, a standard forjudg¬ 
ing presence or absence of scale-unit bias that applies to the 
shape of the frequency distribution is in any event defective, 
The distribution form is a function of the population involved 
as well as of the scale units. Normality should certainly not be 
considered desirable where there is strong presumptive evi¬ 
dence that selection of cases has occurred. 

It is fortunate that, in general, product-moment validity 
coefficients do not seem to be seriously affected by alteration of 
scale units so long as rank order is unchanged. When test scores 
are converted to normalized form, or when ratings obtained in 
rank-order form are normalized, product-moment validities are 
usually very little altered. 

While the product-moment validity for the entire range is 
probably little affected by scale-unit distortion validity, in¬ 
dexes computed for particular points of cut on the predictor may 
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be seriously affected. Where scale-unit bias is suspected, such 
coefficients should be interpreted with caution. 

We might note also that a heavily skewed criterion distri¬ 
bution, if established as genuine, would have implications of 
some significance for efficient selection. Individuals on the tail 
of a skewed distribution could undoubtedly be identified with 
greater confidence than those in the same percentile point on a 
normal curve. Thus, while the problem of scale units may not 
be of great significance when conventional methods of analysis 
are employed, a solution to the problem that would allow identi¬ 
fication of highly skewed distributions with confidence could 
lead to improved efficiency of selection through different 
methods of analysis, 

From the criterion point of view, the scale-unit problem re¬ 
duces to one of establishing units which represent equal incre¬ 
ments in terms of the over-all efficiency of the organization. 
This point will be elaborated by the authors in a forthcoming 
paper on that topic. 

In terms of the efficiency of the organization, production 
records appear to he relatively free from scale-unit bias. An 
additional object produced has equal value whether it increases 
the productivity measure of an individual from i to % or from 
99 to ioo. A given error is just as costly no matter whether it 
increases the error score from 4 to 5 or from 19 to 20. Such units 
have meaning in their own right. Even in the evaluation of 
quality of production, differences in quality can be assigned 
values having direct meaning if the resulting objects of differing 
quality are eventually sold for different prices. Quality differ¬ 
ences would then acquire a quantitative monetary value. This 
cannot, however, always be accomplished. 

Ratings are subject to a number of forms of criterion scale- 
unit bias. Piling at the upper end of the scale, failure to employ 
the lower scale units, piling in the center of the scale and other 
defects have all been frequently reported in the literature. Since 
these tendencies appear in wide varieties of rating situations, it 
seems reasonably certain that they are distortions of the scale 
units and are not due to the nature of the true distribution of 
the degree of productivity in the job element being rated. 

Lack of information as to the true or proper distribution 
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form considerably hampers the solution to the problem of scale- 
unit bias in ratings. While is seems reasonably certain that the 
scale-unit biases mentioned above do often occur, it is difficult 
to judge in any particular instance when a rating scale is free of 
scale-unit bias and, more particularly, the nature of such bias 
as may be present. 

In the absence of evidence to the contrary, a normal distribu¬ 
tion of criterion rating scales would usually indicate freedom 
from scale-unit bias. If the distribution of production records, 
on the job element being rated, is known from other research 
studies, such distributions would probably provide a sounder 
basis for judging the adequacy of the distribution form of the 
criterion ratings than would the normal curve 

In the use of order-of-merit rankings it is apparent that the 
form of the distribution is forced and that equal numbers of 
individuals fall within each interval of a given magnitude. If, 
however, rankings are obtained from a number of different 
raters it will usually be found that the average of the rankings 
will approximate the normal curve to a satisfactory degree 

No problems of scale-unit bias arise which are peculiar to 
job-sample criteria. If job-sample criteria are scorable in pro¬ 
duction units, comments made with reference to scale-unit bias 
in production units will apply here also. If scoring is subjective, 
problems similar to those encountered in rating scales will 
occur. It seems probable, however, that scale-unit bias will be 
less extreme than that occurring in direct evaluation of indi¬ 
viduals. The direct evaluation of production has the added ad¬ 
vantage that in some cases it can be divorced from the indi¬ 
vidual to some degree and thus escape, in part at least, some of 
the biases which stem from the interpersonal relations between 
rater and ratee. 

Achievement tests employed as criteria involve scale-unit 
biases of a nature peculiar to continuous variables obtained by 
summing a number of dichotomous items. Where rating scales 
are so constructed they will also be subject to this form of scale- 
unit bias. Variation in the difficulty level (i.e,, percentage of 
raters checking a given item) will have considerable effect upon 
the distribution form of the total score. The effect here is very 
similar to the effect of item-difficulty distribution on factor 
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structure of tests discussed in some detail by Ferguson (6) 
and Wherry and Gaylord (id)- The frequency of occurrence in 
the population of a component element of a criterion scale is 
analogous, in other words, to the difficulty level of component 
items of a test insofar as the statistics of their interrelations are 
concerned. If a criterion consists of high difficulty elements, it 
will tend to correlate more highly with tests also consisting of 
high difficulty items and less highly with tests consisting of low 
difficulty items. 

When a criterion variable consists, then, of a number of dis¬ 
crete items, the investigator should take care to insure that the 
difficulty level or the "frequency of occurrence” level corre¬ 
sponds to the frequency of occurrence of the job element in 
the work situation. To accomplish this purpose, a difficulty 
distribution should probably be determined for each set of cri¬ 
terion components, and the number of observations or measures 
at each level should he made to adhere to this predetermined 
distribution. 

Criterion Distortion 

An additional source of bias, which we have referred to as 
“criterion distortion,” arises as a result of the improper assign¬ 
ment of weights to the several elements. More broadly defined, 
criterion distortion would include all of the other types of bias 
discussed. Thus, criterion deficiency is the assignment of weights 
of zero to elements that should in reality have non-zero weights. 
Criterion contamination is the opposite error; die assignment of 
non-zero weights to elements that merit no consideration, 
Criterion scale unit bias in effect assigns different weights to 
different parts of the continuum of the given criterion element, 

A number of techniques have been proposed for determining 
the proper weights for criterion elements. We may do well to 
examine several of the procedures that have been proposed and 
to investigate the type of situation in which each is most ap¬ 
propriate. 

Horst (ri) and Kdgerton and Kolbe (5) have proposed pro¬ 
cedures which, in effect, operate to maximize the reliability of 
the over-all criterion. The assumption implicit in these tech¬ 
niques is that all criterion elements measure the same basic 
variable and that the lack of perfect correlation between them 
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is attributable to error of measurement. This procedure would 
thus be quite applicable to situations in which the criterion con¬ 
sisted of several measures of the same attribute, such as ratings 
by different observers of the same trait. It seems evident, how¬ 
ever, that this technique should never be employed in combin¬ 
ing elements which attempt to assay behavior on different con¬ 
tinue, Unfortunately, the technique has often been used for 
this latter purpose. It is the author’s opinion that the chief 
advantages of employing techniques developed by mathemati¬ 
cal derivation lie in the thorough and explicit manner in which 
the assumptions must be stated. If the assumptions used are 
ignored in applying the technique or formula developed, the 
mathematical development is, in a sense, disadvantageous in 
that it lends prestige to a formula completely unsuited to the 
particular application 

Where no objective basis exists for the establishment of the 
relative weights of criterion elements, weights obtained by 
Toops’ (14) method of guessed Beta weights is, in the authors' 
opinion, superior to an unweighted raw or standard score sum. 
Toops proposes averaged estimates of the judged importance 
of the various criterion elements as a means of weighting, the 
judges being those personnel in the sponsoring agency having 
the best knowledge of the implications of various criterion 
elements for the efficiency of the organization as a whole. 
There are a number of technical problems involved in making 
clear to the judges the proper basis for guessed Betas. Consider, 
for example, the problem of obtaining weights for combining 
the number of production units and the number of errors. 
Should the evaluations requested be phrased so that raw-score 
weights are obtained or so that standard-score weights are 
obtained? Since the judges will probably not understand the 
effect of differences in the standard deviation of criterion ele¬ 
ments on their effective weighting, how can bias from this 
source be avoided? In spite of these problems in technique the 
method provides a direct approach to the basic problem of 
weighting criterion elements. In addition, its sponsor accept¬ 
ability should be high. These factors, in the author’s opinion, 
suggest the advisability of a more extensive use of this 
technique. 

It should be stressed that the common practice of computing 
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separate validity coefficients for the various subcriteria is 
equivalent, in the final analysis, to a method of combining 
criteria scores. It differs in that the experimenter avoids a 
formal procedure. Instead, he merely looks at the validities 
against the several criteria and decides on the tests which are 
to constitute the selection battery. Such a procedure has the 
effect of concealing from the research worker himself the fact 
that he is deciding the relative importance of the sub-criterion 
variables. Usually, the investigator will decide to include sev¬ 
eral tests for the prediction of each of the criteria, and will fail 
to consider the relative importance of the criteria or to evaluate 
properly the effect of the intercorrelations and validities or the 
partial regressions for predicting a composite. The problem is 
thus evaded rather than solved. In general, a formal solution 
will at least make explicit the basis for the decisions concerning 
the relative importance of the several criteria and will avoid 
incidental errors which may creep in because of carelessness in 
the subjective handling of the data. 

A suggestion by Otis (12) may, in particular instances, lead 
to a more meaningful combination of subcriterion scores than 
would result from the application of any of the procedures so 
far mentioned. Otis pointed out that, in key-punch operation, 
it was discovered that the correction of an error required the 
time equivalent to that needed for punching 14 cards, He 
suggested, consequently, that a total over-all production index 
could readily be obtained simply by subtracting 14 cards for 
every error made. 

The method of combining criteria suggested in this particu¬ 
lar instance, is not exactly a technique and does not suggest any 
uniform procedure that can be widely employed, It does sug¬ 
gest, however, that detailed examination of the relationship 
between the different work units measured and the organization 
of the over-all productive process will often suggest that certain 
different sub-criteria are, or can be, expressed in units which 
are equivalent in their effect upon the total productivity ot the 
organization. 

The effect of the use of inappropriate weights for criterion 
elements, as with other forms of bias, will depend upon the 
extent to which it is predictor free or predictor correlated. The 
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overweighting of any given element will naturally afford undue 
weight to the predictors that have the highest correlation with 
the overweighted element or elements. Conversely, the pre¬ 
dictors that correlate highest with underweighted elements 
would be given inadequate weight. Prediction would hence be 
distorted and while in a selection problem, for example, the 
predictors would align the population in accord with the cri¬ 
terion as weighted, this alignment would be at variance without 
the “true” criterion. 

The reader may readily judge that the authors consider 
most procedures for criterion combination in current use to be 
not wholly adequate. This appears to be an area particularly in 
need of further research. 


Summary 

This paper proposes a classification of criterion bias into 
four main categories • 

1. Criterion deficiency 

2. Criterion contamination 

3. Criterion scale unit bias 

4. Criterion distortion 

Each category is discussed in terms of the steps in the cri¬ 
terion-construction process in which it is most likely to occur. 
Each is also briefly related to the several kinds of criterion 
measures. Each type of bias is also considered in relation to 
various types of criterion measures. The importance of dis¬ 
tinguishing between bias that is test free and bias that is test 
correlated is emphasized. In discussing possible biasing factors, 
the test-free or test-correlated character of the biasing factor 
has received continual emphasis. 

Biasing factors reported in the literature have been con¬ 
sidered. Additional concepts have been advanced by the 
authors. 
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AN INVESTIGATION OF TWO HYPOTHESES REGARD¬ 
ING THE NATURE OF THE SPATIAL-RELATIONS 
AND VISUALIZATION FACTORS 1 

WILLIAM B MICHAEL 
Princeton University 
and 

WAYNE S ZIMMERMAN and J. P GUILFORD 
University of Southern California 

Primarily as a consequence of the factorial analyses of tests 
of intellectual abilities, the construct of a spatial and/or visual 
ability amenable to psychological measurement has received 
increasing attention in recent years During the past twenty- 
one years, at least a score of investigators have identified in 
their writings a space factor. In a pioneer study, Thurstone 
(16) included among his seven primary mental abilities a fac¬ 
tor labelled S, which he characterized as a “facility in spatial 
and visual imagery,”—a factor which he likened to the spatial 
or visual group factor found by Kelley (13) in earlier experi¬ 
ments. The same factor was identified in other studies carried 
out subsequently by Thurstone (17) and by Thurstone and 
Thurstone (18). 

During World War II members of the psychological research 
units of the Army Air Forces devoted a considerable amount 
of time and effort to the development of tests of the “spatial- 
visual” type to be used in the selection of men for air-crew posi¬ 
tions. Several factorial studies which have been described in a 
research report of the AAF Aviation Psychology Research 
Program, edited by Guilford (3), have indicated that the vari- 
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study, their cooperation in making subjects available, and their assistance in adminis¬ 
tering a number of the tests To all Rutgers students who participated, special thanks, 
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ance associated with Thurstone’s spati ah visualization factor 
may he separated into two apparently independent factors 
identified to be spatial relations and visualization (visual manip¬ 
ulation). In fact, in addition to these two factors (abbreviated 
by the symbols S, and Vz), two other less definite space fac¬ 
tors, S2 and S a , and a factor tentatively identified as visual 
memory, also appeared in several analyses. 

In two recent studies both Fruchter (2) and Dudek (1) have 
found separate factors of spatial relations and visualization. In 
his investigation as to the nature of verbal fluency Fruchter 
reanalyzed a sub-matrix of twenty tests selected from the 
battery of fifty-seven variables employed by Thurstone in his 
classical study previously cited (16). He found two indepen¬ 
dent factors which he described as being spatial-relations and 
visualization. 

Referring to the same Thurstone study, Zimmerman (3) 
pointed our that further rotations of the residual axis (Number 
XII) with other axes which defined meaningful factors would 
produce a promising factor of visualization. Just recently, Zim¬ 
merman (in his unpublished doctoral dissertation) has rero¬ 
tated the twelve centroid axes for all fifty-seven variables and 
lias confirmed his initial belief that both a spatial-relations and 
a visualization factor would appear. 

Problem 

The purpose of the investigation was to test the validity of 
two (apparently unrelated) hypotheses that purport to repre¬ 
sent differences in the psychological properties of the factors 
of spatial-relations and visualization as reflected by correspond¬ 
ing differences, both in the respective contents of two types of 
tasks and in the respective work procedures required of the sub¬ 
jects for successful completion of them. Each type of task con¬ 
sisted of a group of three tests. Within each of the two groups 
of tests employed in the study there appeared to be not only 
a similarity in the format of the test items, but also a common 
approach or operation demanded of the examinee. 

In broad outline the plan followed in the investigation was 
to incorporate within a test battery two groups of tests which 
the investigators believed to be representative of the psycho- 
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logical operations involved in the hypothetical statements as to 
the nature of the spatial-relations factor and of the visualiza¬ 
tion factor. In the selection of each test to be incorporated 
within a group, introspection was freely employed as an aid to 
the determination of the psychological processes used in the 
subjects’ performance upon a test—the same processes sup¬ 
posedly as those indicated in the relevant hypotheses. To these 
six tests (1 e., two groups, each consisting of three tests) were 
added eight reference tests of fairly well-known factorial con¬ 
tent to aid in the identification of those portions of variance in 
the six tests that were associated with other factors such as 
verbality, numerical facility, reasoning, and perceptual speed. 
The inclusion of other factor tests served not only to identify 
what probably without their presence would be large amounts 
of specific variance within each of the six tests, but also to in¬ 
dicate the relative degree of purity of each of these six tests with 
respect to the function it was hypothesized to measure. 2 

A sufficient, though not necessary, condition for the tena- 
bility of each of the hypotheses, would be that in the factor- 
analysis procedure each of the two groups of three tests would 
define a factor. Moreover, this factor should not appear to be 
weighted in other tests of the battery that were selected to 
measure other factors. If one or more tests within either group 
should be weighted substantially in variance associated with 
another factor, the evidence for the corresponding hypothesis 
would be less clear-cut, but not necessarily lacking. It would be 
quite possible, if not almost certain, that one or more of the 
three tests within a given group might be factorially complex. 
At the same time, however, all three tests within a given group 
might contain substantial amounts of variance in one factor 
that did not appear in any of the other eleven tests. 

Hypotheses 

The factor of spatial relations was hypothesized to represent 
the ability to comprehend the arrangement of elements within 

J It was also thought to be very desirable to determine whether tests of the type 
used by the AAF and Thurstone’s tests held in common factors identified ns being the 
same. This is the first study the writers know of that will serve to check upon the belief 
that many of the Thurstone primary abilities and the AAF factors are identical Only 
the Thurstone space factor is here called into question. 
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a visual stimulus pattern, primarily with reference to the hu¬ 
man body. Thus, an important implication in the ability to per¬ 
ceive spatial arrangements is that the subject is able to dis¬ 
tinguish whether one object is higher or lower, left or right, or 
nearer or farther than another within the same field. Through 
the presentation of two simulated views of a stimulus pattern, 
a test item may he constructed such that there is a systematic 
relationship between the order of elements within the first 
spatial pattern (the stimulus component of a test item) and the 
order of elements within the second pattern (the response 
component of a test item). 

For example, in Thurstone’s Cubes test the examinee is asked 
to recognize whether the designs on the sides of a second cube 
can hold the same relationship to one another as they do on the 
first cube. By noticing within each cube the left-right, top- 
bottom, and front-back interrelationships of the faces, the sub¬ 
ject is able in each item to refer the locations of three designs 
on three exposed faces of one cube to the locations of designs 
on the faces of the other cube. In Thurstone’s Flags test the 
examinee is required to tell whether the exposed faces of two 
American flags of identical size can represent the same side of 
the flag. Relating corresponding left-right and top-bottom 
boundaries (outlines) of the two flags appears to be an impor¬ 
tant aspect of the solution. Similarly, in Guilford and Zimmer¬ 
man’s test of Spatial Orientation a premium is placed upon the 
examinee’s maintaining the correct relationship of objects to 
one another in background scenery that has been viewed twice 
from a motorboat-—first before and then after.its prow has 
moved up or down and/or left or right, In the test the examinee 
is asked to determine the relative amount and direction of 
movement of the boat corresponding to changes in the two 
views of the background setting. 

The factor of visualization was hypothesized to represent an 
ability that requires the mental manipulation of visual images. 
In contrast to another factor identified as visual memory (3), 
which appears to be a static or reproductive form of visual¬ 
ization, the factor referred to as visual manipulation, or simply 
visualization, is dynamic. This visual manipulative ability ap¬ 
pears to be present in the solution of problems in which the in- 
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dividual finds it necessary mentally to move, rotate, turn, twist, 
or invert one or more objects. Following the performance of the 
presented manipulation the individual is required to recog¬ 
nize the new position, location, or changed appearance of the 
object or objects. 

Three tests selected to yield evidence for the second hypoth¬ 
esis included two by Thurstone, Punched Holes and Form 
Board, and one by Guilford and Zimmerman, Spatial Visual¬ 
ization. In the test of Punched Holes the examinee is presented 
a symbolic representation of a folded sheet of paper into which 
one or more holes have been punched and is required to imagine 
where the holes will be when the sheet is unfolded In the 
second Thurstone test the examinee apparently finds it neces¬ 
sary in each item mentally to turn, rotate, or invert two or 
more flat geometric figures in such a way that they can be 
placed together to fit within the outline of a larger geometric 
figure. In each of the tests, the examinee is asked to record the 
final positions respectively of the holes and of the geometric 
figures In the test of Spatial Visualization the subject is re¬ 
quired mentally to turn, tilt, or rotate a three-dimensional 
object—an alarm clock—drawn on a sheet of paper into a final 
position according to written instructions As alternative re¬ 
sponses the pictures of the clock are presented in five positions, 
one of which is correct. (A more detailed description of these 
three tests follows in the next section.) . 

Whereas in the two Thurstone tests the examinee is required 
to draw in his solution to the problem, in the third test he 
merely selects as his solution one of five choices presented. It 
is quite likely that in addition to measuring visual manipu¬ 
lative ability other factors are involved in the three tests—fac¬ 
tors reflecting the manner in which responses to the items are 
recorded 

Another important difference in the nature of the psycho¬ 
logical processes hypothesized for the spatial relations and 
visualization factors was that of speed of response. As indicated 
by findings in the AAF Aviation Psychology Program, the 
tests thought to measure the spatial relations factor were ad¬ 
ministered with fairly short time limits, but those tests thought 
to measure visualization were given with fairly liberal time al- 
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lowances. The spatial relations factor was considered to de¬ 
mand a fairly rapid decision on the part of the examinee as to 
the spatial position of objects with reference to his own loca¬ 
tion; whereas, the visualization factor was believed to be rep¬ 
resented in problems requiring a more deliberate and less auto¬ 
matic approach. In part, such a distinction may be a function 
of the complexity of a task (i,e., the number of steps entering 
into the performance of an item), the more complex tasks re¬ 
quiring visualization for their solution. 

Concerning the psychological properties of spatial-relations 
and visualization factors, one other important difference has 
been suggested in the work of one of the psychological research 
units of the AAF, as follows: 

The idea for Flight Orientation [a test] was proposed at 
the time Aerial Orientation (another test] was being developed. 

It was hypothesized (1) that the ability visually to maneuver 
an airplane as if from a position outside the cockpit is a mani¬ 
pulatory-visualization ability and (2) that the ability to imag¬ 
ine maneuvers taking place as if the examinee were within the 
cockpit is a spatial-orientation ability. 

The Aerial Orientation test utilized cockpit views of outside 
terrain to be matched with depicted plane attitudes; the 
visualization-of-maneuvers tests involved only views of air¬ 
planes seen from a position outside of the cockpit. , .. Flight 
Orientation was designed to fulfil! the requirements of the 
indicated variation—a test that would utilize only cockpit 
views of outside terrain. From hypotheses given above, it 
follows that Aerial Orientation should measure a combination 
of manipulatory-visualization and spatial-orientation abilities, 
while Plight Orientation should be a purer measure of the 
ability to orient in space (3). 

That the two groups of tests selected for investigating the 
. validity of the hypotheses may actually contain variance in 
both the spatial-relations and visualization factors would not 
be surprising, inasmuch as many subjects on the basis of their 
own introspective reports revealed that they made use of the 
two psychological processes associated with the respective hy¬ 
potheses in tests selected to represent the implications of only 
one hypothesis. For example, if in the Flags test the subject 
is able, so to speak, to pick up the flag, move it, turn it about 
as if he actually has a model in his hands, then visualization is 
believed to be dominant. On the other hand, if the subject is 
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concerned primarily with the left-right and top-bottom orienta¬ 
tion of edges of flags with respect to his own position, or if he 
has to move himself to a different position as in cocking his 
head to one side, then a spatial factor is believed to be more 
prominent. 

Similarly, in the Cubes test if the subject reports he picks 
up the first cube and rotates it into a final position which 
matches (or cannot match) the second cube, then the visual¬ 
ization process is dominant. However, if he attempts primarily 
to interrelate the positions of the sides of the cubes with respect 
to his own position, or if he appears to project himself amidst 
the cubes as if he were walking about them and relating the 
locations of various sides with respect to his own position, then 
the spatial-relations factor is probably operative It may well 
be that in the spatial-relations factor empathy plays an im¬ 
portant role in the relating of the position of objects to one’s 
own location, whereas in visualization the individual obtains 
first from a distance an overall view of the objects to be manip¬ 
ulated and then employs perhaps some rather restricted kines¬ 
thetic imagery in the imagined use of hands for moving the 
objects into their required positions. 

Despite the apparent differences in approach employed by 
many subjects, it did appear that the two groups of tests 
chosen represented reasonably well a distinction between the 
psychological processes hypothesized. If a test did involve to a 
substantial degree the use of two or more psychological abili¬ 
ties, it was thought that the factor-analysis procedure would 
reveal such a fact. 


Tests 

In Table I are presented the names of the fourteen pencil- 
and-paper tests employed in the battery, the maximum num¬ 
ber of items that could be attempted, the plan followed with 
respect to “speed” or “power” time-limit, the actual working 
time allowed, and the scoring formula used. The numbering of 
the tests in the tables, as well as in the following description 
of content and procedure, corresponds to the order of adminis¬ 
tration. During the first, second, third, and fourth testing 
periods, respectively, the following groups of tests were ad- 
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ministered: i, 2, 3, and 4; 5, 6, 7, 8, and 9; 10 and ii; 12, i^ ( 
and 14. An ample number of practice exercises preceded the 
main body of cadi test. Further information concerning several 
of the tests may be found both in a manual (11) and in the 
literature (12, 16. 18, 20). It is believed, however, that the 
descriptions given will suffice for the interpretation of the fac¬ 
tors to be presented. 

1. Guilford-'Aim m mini >1 Verbal Comprehension.— This is a vo¬ 
cabulary test in which the examinee is required in each item to 


Nam* »( 


TABLE 1 

The Ten Battery: Descnptne Data 

Numtar Timing P)an 
<»f Items Npwftl nr Vaxeri 


UnituiK Scoring 

I line Formula 


I* Guilford Zimmerman Verbal 


Cttmprehrn .Urn 

Guilford-Zimmerman General 

.O 

Power 

10 min. 

R-W/4 

_ Reasoning 

Guilford-Z.inimcmtan Numeri¬ 

<■1 

Power 

M min. 

R-W/4 

cal Operation* 

Guilford Z.immernun Percep¬ 

no 

Speed 

5 min. 

R-W 

tual Speed 

Guilffird/.imincrmtm Spatial 

4 * 

Speed 

3 min., 

45 icc 

R-W 

Orientation 

(rt) 

S|secd 

8 min 

R-W/4 

Thurstone (Verbal] Comple¬ 
tion . 

SO 

Power 

7 min 

R-W/4 

Thuratonc Number Series 
Thurstons Identical Forms 

70 

Power 

8 min. 

R 

40 

Speed 

3 min., 
r 5 wsc, 

R-W 

Thurstone Cubes 

<0 

Speed 

5 min. 

R-W 

Thurstone Mag* 
Guilford-Z.immerman Spatial 

4 « 

Speed 

4 min 

R-W 

Visualization . . . 

40 

Power (limited) 

15 min. 

R-W/4 

Thurstone Punched Holes. 

to 

Power 

7 niin 

R 

Thurstone Pattern Analogies 

70 

Power 

10 min. 

R-W/4 

Thurstone Form Hoard 

38 

Power 

7 min. 

R 


choose among five words, all matched with respect to difficulty, 
the one word which most closely approximates the meaning of 
the stimulus word. Items increase in difficulty progressively 
from the beginning to the end. Even numbered items were 
omitted. Responses were recorded on a separate answer sheet. 
Most examinees attempted all items. 

1, Gui/ford-Zimmerman General Reasoning.-— This test is com¬ 
posed of arithmetical-reasoning problems similar to those en¬ 
countered in courses in general mathematics, elementary alge¬ 
bra, and intermediate algebra. Diagrams accompany a few of 
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the problems. Numerical work is kept to a minimum. Five 
multiple-choice responses are presented with each problem 
statement. Items increase in difficulty level progressively from 
the beginning to the end. Even-numbered items were omitted 
Responses were recorded on a separate answer sheet. Most ex¬ 
aminees attempted all assigned items. 

j Guilford-Zimmerman Numerical Operations —This test is 
in four parts, consisting of numerous simple problems (of about 
the same difficulty level) involving respectively the four funda¬ 
mental operations of addition, subtraction, multiplication and 
division. Emphasis is placed in the directions upon the need 
for both accuracy and speed of work. Subjects were told to 
begin with the part upon addition, to work every item, and to 
go as far as possible in the allotted time Only a few subjects 
reached the fourth section upon division Responses to the 
items were printed in spaces on the test booklet adjacent to 
the problems. 

4 Guilford-Zimmerman Perceptual Speed. —This test requires 
the examinee to match a visual object of a familiar shape and 
of detailed design with one of five other visual objects of a com¬ 
mon category (e.g , automobiles, boats, hats, shoes). Four of 
the five response objects resemble rather closely the stimulus 
object, but differ from it in certain minor details of shape and/ 
or design. For each common category two parallel sets of visual 
objects—four stimulus and five response objects—aie arranged 
in two parallel columns. To each one of the four stimulus 
figures in the first column corresponds one of the five response 
figures. Thus, four responses are scored for each item of homo¬ 
geneous content. All items represent a low level of difficulty. 
Answers to the items were marked on the test booklet in spaces 
adjacent to each stimulus object. The examinees were told to 
go as far as possible in the allotted time No examinee finished. 

5 Guilford-Zimmerman Spatial Orientation. —This test re¬ 
quires an examinee to determine how the position of a boat 
has changed in a second picture from its initial position in a first 
picture. In each picture the prow of the motorboat, in which 
the examinee is told to pretend to be riding, is shown along 
with background scenery consisting of water, or a silhouetted 
shore line, and in some instances of other boats intervening be- 
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tween the shore line and the prow of the motorboat, which is 
in the extreme foreground of the picture. In the sample prob¬ 
lems described in detail in the directions, the position of the 
prow in the second picture, with respect to the spot of back¬ 
ground sighted over it in the first picture, is taken as the 
primary reference guide for determination of the direction and 
amount of subsequent up-down and/or left-right motion of the 
boat. Movement is also indicated by accompanying shifts in 
the location of elements within the pattern of visible back¬ 
ground scenery. The boat is actually stationary with respect 
to any forward-backward motion. To each set of two pictures 
five alternative responses are presented, Kach response is rep¬ 
resented by (i) a dot designating the aiming point, the initial 
spot in the background sighted right over the point of the prow 
in the first picture, and (a) an arc (of about 45°) representing 
the location of the prow in the second picture with reference to 
the aiming point. One of the five responses shows the correct 
change in position of the prow of the boat with respect to the 
aiming point. All examinees were instructed in the limited 
time allowed to attempt as many items as possible. As in all 
other speed tests, answers were recorded in the test booklet. 
The difficulty of the items tends to increase for items further 
removed from the beginning of the test. No one attempted 
every item. 

6. 'Thurstons [Verbal] Completion - -This test is one adapted 
from the Psychological Examination of the American Council on 
Education. Representing, probably, a combination of verbal 
comprehension and verbal fluency, it presents for each item the 
definition of a word, the number of letters in the word, and 
five alternative letters (responses), one of which represents the 
initial letter of the defined word. Although the items differ con¬ 
siderably with respect to difficulty, most of the defined words 
are familiar to college students, Responses were recorded on the 
page of test items, Nearly every subject attempted all items. 

7. Thurstons Number Series.— Found to be loaded in a factor 
identified by Thurstone as induction, this test requires the 
subject to determine a rule for each item. Numbers are pre¬ 
sented in a row with two blanks inserted. The task is to find 
the mathematical principle by which the number series 
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is formed and to insert in the blank that number which is ap¬ 
propriate. The difficulty level of items increases in relation to 
the position of the item from the beginning of the test. Re¬ 
sponses were recorded on the test sheets in the blanks inserted 
at various positions within the different number series. Most of 
the subjects attempted all items. One point of credit was given 
to each blank correctly filled (two points per item being maxi¬ 
mum score). 

8. Thurstone Identical Forms. —This test resembles rather 
closely the fourth test, Perceptual Speed , in that the examinee 
selects from a row of five similar appearing figures that one 
which is exactly the same as the stimulus figure. Slight differ¬ 
ences in color design and in shape appear among the five re¬ 
sponse figures. In this test the items are also homogeneous with 
respect to difficulty The number corresponding to the se¬ 
quential position of the response selected was recorded on the 
test page in a box to the right of the row of response objects. 
Only a few examinees reached the last few items 

9. Fhurstone Cubes. —In this difficult test the subject is asked 
whether two drawings can represent the same cube on each face 
of which there is supposed to be a different design In each of 
the two drawings the designs of three faces of the cubes are 
always exposed. If the two drawings can represent the same 
cube, a plus sign is placed in a blank square to the right of the 
two drawn cubes. If, on the other hand, the second drawing can¬ 
not represent the cube of the first drawing, then a negative sign 
is placed in the adjacent square. In the short time allowed no 
one attempted all items. 

10 Fhurstone Flags. —On this test two flag pictures, of the 
same size and of identical design, are presented occasionally in 
the same position, but generally in different positions. If the 
two drawings represent the same face of the flag, a plus sign 
is placed in a square on the test sheet just to the right of the 
two flags. If the two drawings represent opposite faces of the 
same flag, a minus sign is placed in the adjacent square. As in 
the test of Cubes the items were homogeneous with respect to 
difficulty. However, they were easy for most subjects. A few of 
the subjects attempted all items during the short period of time 
allowed. 
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II. Gu i (ford - Z i m m erm a n Spatial Visualization .—This is a 
test in which the examinee attempts to imagine the movement 
of a dock in space from an initial position to a final position as 
directed hy a verbal statement. The test is divided into three 
parts. In the first part, one movement of the clock is required 
to effect the final position; in the second part, two movements 
are called for, and, in the thin! parr, three movements are in¬ 
dicated hy the directions accompanying each item. Three types 
of movements are required. Each type of movement refers to 
the revolution of the clock about an axis in one of three di¬ 
mensions. The actual movement involves a revolution of the 
dock to the right or to the left a specified number of degrees. 
The word “turn” is used to designate a revolution about the 
base or the “6-12” axis where the numbers refer to the nu¬ 
merals representing hours on the clock. When the dock is tilted 
such that top moves either forward or backward, or in other 
words, when the dock is revolved about the “3-9” axis, the 
word “tilt” is employed. When the dock revolves about an 
axis perpendicular to its face, the word “rotate” is used. In the 
second part, two different types of movement are required, and 
six permutations of sequence of movement are used. In the 
third part, the same sequence of movements is followed in all 
items (rotate, tilt, and turn). Nearly all of the subjects failed 
to complete the entire test, but about 80 per cent attempted 
all items in the first two parts. Items were scored up to the 
point at which f >7 per cent of the group attempted them. 

12. Thurstons Punched Holes,— Each item in this test con¬ 
sists of a series of figures representing a square sheet of paper 
that has been folded by steps (as indicated by clotted lines) into 
smaller squares, rectangular, or triangular sizes. One or more 
holes are punched into the final folded form. The task for the 
subject is to imagine where the holes will be when the sheet is 
unfolded. As an aid to the subject’s performance in the more 
difficult items one or more figures representing the appearance 
of the sheet of paper at intermediate stages of unfolding are 
presented. On the unfolded (square) sheet the subject indicates 
by drawing small circles where the holes will be. In the scoring 
of the item all holes must be properly spaced in relation to one 
another if credit is to be given. An item was scored right or 
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wrong (no partial credits were given). Nearly every subject 
completed all the items. 

13. Phurstone Pattern Analogies —Adapted from similar tests 
in the American Council on Education series, this test is com¬ 
posed of items each of which consists of eight figures The first 
three (stimulus) figures are labelled A, B, C, and the next five 
(response) figures are designated /, 2 , j, 4, and p After the 
examinee determines the rule by which figure A is changed to 
figure B, he applies the rule to figure C and picks out among 
the five arabic numbered responses that one which satisfies the 
requirements of the problem. In the more complex items the 
examinee may frequently change his hypothesis as to the prin¬ 
ciple connecting A and B in view of limitations imposed by the 
nature of the five responses figures. In the time allowed, most 
subjects completed all items. 

14. Phurstone Form Board —Almost identical with the Min¬ 
nesota Form Board Pest, except for the inclusion of printed in¬ 
structions and a practice exercise, this test consists of items 
made up of several two-dimensional pieces (colored black) of 
various geometrical shapes which the examinee attempts to fit 
together in an appropriate arrangement within a larger geo¬ 
metric form (uncolored figure within an outline). The subject 
draws lines within the large white (uncolored) design to show 
how the black pieces can be placed in order to fit within the 
outline Extreme accuracy in drawing was not required, but the 
solution had to be indicated clearly. No partial credits were 
given. Although the items became increasingly difficult as one 
approached the end of the test, very few subjects failed to 
attempt all the items in the time allowed 

5 the Sample 

To a group of 500 male students enrolled in a two-semester 
course in beginning psychology at Rutgers University the bat¬ 
tery of fourteen pencil-and-paper tests was administered. Since 
four class periods, spread over the last part of the first semester 
and the first part of the second semester of the academic year, 
were required for completion of the project, many of the sub¬ 
jects were not present at all class sessions. Makeups were given 
in several instances Complete results were obtained for 360 
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subjects. These individuals appeared to be a representative 
sample of the University student body in light of biographical 
information obtained from each student. Consisting of 220 
freshmen and sophomores and 140 juniors and seniors majoring 
in virtually every department of the University, the sample was 
deemed satisfactory. Approximately 54 per cent of the subjects 
were veterans of World War II. The ages of the subjects ranged 
from 16 to 34, the median age being 22 years. 

In order that a satisfactory degree of interest might be sus¬ 
tained throughout the duration of the study, all students were 
told that they would be given their scores upon completion of 
testing in profile form. In fact, most subjects received scores on 
those tests completed during the first two class periods at the 
beginning of the third pericxl, It was thought that additional 
motivation might be provided if the temporal interval between 
taking the tests and receiving the scores was not too long. 

The Factor Analysis 

The matrix of test intercorrelations (all product-moment) 
presented in Table 2 was factor-analyzed by Thurstone's cen¬ 
troid method in the usual manner with one minor exception 
(13). In the reflection of signs the criterion was that used by 
the workers in several of the psychological research units of the 
United States Army Air Forces during World War II. The 
algebraic sum of a column, with the diagonal entry disregarded, 
was employed instead of the mere number of negative signs ap¬ 
pearing in a column. This procedure not only tends to guaran¬ 
tee positive sums but also appears to approximate more closely 
the maximizing of table totals than does the criterion involving 
number of negative signs. 

Because of marked discrepancies between obtained commu- 
nalities in the first set of centroid extractions and the estimated 
communalities in the diagonals, a second set of extractions was 
required. Following the second extraction (of seven centroid 
factors) the obtained communality of no test differed more than 
| .071 from the second estimated communality. 

The criterion employed for cessation of extraction of the cen¬ 
troid factors was also that used by workers in the psychological 
research units of the AAF; namely, that factoring should not 
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cease until the product of the two highest factor loadings is at 
least less than the standard error of the corresponding correla¬ 
tion. Such a criterion tends to yield a greater number of factors 
than do most other criteria. The rationale underlying this less 
stringent criterion is that the maximum contribution which the 
factor makes to the scalar product of two test vectors, or to the 
correlation between two tests, is no greater than the chance 
relationship expressed by the standard error of the correlation 
coefficient. 

Following the completion of a set of trial rotations, it was 
considered advisable to extract two more centroid factors as an 
aid to further rotations. It was known that probably only six 
factors would be meaningfully identified. However, previous 
experience has indicated that use of additional centroid axes 
in the rotation process frequently brings about, more readily, 
a psychologically meaningful solution. The superfluous factors 
eventually appear as mere residuals (factors containing insignif¬ 
icant amounts of communality) to which no interpretation can 
be dependably given. Moreover, the presence of residual factors 
seldom interferes at the conclusion of the rotation procedure 
with the interpretation of those principal factors which account 
for most of the common-factor variance. 

Fifty-six rotations of pairs of axes were required to satisfy 
Thurstone's criteria of positive manifold and simple structure. 
Each rotation was achieved graphically according to the method 
devised by Zimmerman {19}. In general the structure deter¬ 
mined the direction and magnitude of each new rotation. Infor¬ 
mation concerning the content of tests was put to use only to¬ 
ward the end of the rotation procedure when minor adjustments 
were made. In view of the large number of rotations the differ¬ 
ences between the eommunalities of centroid factors and final 
rotated factors were negligible, the largest two discrepancies 
being ,0x7 and .013. An orthogonal reference frame appeared to 
suffice for the interpretation of the factors. The final rotated 
factor loadings are shown in Table 4. 

Ivterpretaiion of Factors 

Inspection of the final rotated factor loadings in Table 4 re¬ 
veals that on the whole the criteria of positive manifold and 
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* Decimal points omitted* 

\ Conummalides based on rotated factor loadings expressed to three decimal places. 
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simple structure have been fulfilled. Six rotated factors were 
meaningfully identified as visualization (Vz), verbality (V), 
numerical facility (N), general reasoning (R), spatial-relations 
(S), and perceptual speed (P) Two other factors (Vi and V 2 ) 
appeared that could not be satisfactorily defined, although their 
weights in certain tests were suggestive of possible interpreta¬ 
tions A ninth factor turned out as a residual with loadings 
ranging from — 08 to +.13. 

Inasmuch as the primary purpose of the study centered about 
the investigation of the factors of spatial relations and visual¬ 
ization, the discussion relating to the identification and meaning 
of the other four factors will be kept to a minimum. The fac¬ 
tors, V, N, and P are actually doublets. However, since the 
factorial content of the pairs of tests weighted in these three 
factors was well known in advance of their inclusion within the 
battery, there is little reason to doubt the correctness of the 
identification given. 

It should be pointed out that the major loadings in some 
tests describing these three factors tended to be somewhat 
smaller than those reported in other studies or in manuals. This 
is due to the fact that many of the tests were shortened in order 
that they might be given within the time period available for 
testing." However, in view of the size of the sample (N = 360), 
loadings of .35 or greater are probably indicative of the presence 
of a significant amount of variance in a factor. . 

Somewhat greater attention should probably be given to the 
interpretation of the factor R. Two tests, General Reasoning and 


* It is possible, however, to estimate what the loadings of these three factors, as well 
as the loadings of the other factors, would be if the tests were not shortened (p), When 
a test is homogeneously changed in length the new factor loadings may be estimated 
by the formula 



where n =» number of times the test has been lengthened, or the ratio of the length of 
the new form to the original form; 
kmi = loading of factor m in the original, or unlengthened test I; 
kmn = loading of factor m in the lengthened, or new, form of the test; 
ru = reliability of the unlengthened test. 

If the shortened experimental forms of tests (i), (a), (3), (4), (<;), and (n) are con¬ 
sidered to be extended to their original length, the corrected loadings m the principal 
factor in each test are estimated to be respectively, .712, 564, 657, ,673, .587, and 
.631, compared to the obtained loadings of .698, J37, .642, .664, 578, and .619 (which 
are rounded to two figures in Table 4) The assumption is made in the speed tests that 
the number of items completed per unit time remains constant. 
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Number Series, are loaded in this factor to the extent of .54 
and 42, respectively. In view of the small number of items con¬ 
tained in the shortened form of the first test (fourteen in all) 
and of the consequent limitation imposed on the reliability of 
the test, the magnitude of first loading is substantial. Although 
the factor may be tentatively described as relating to some type 
of reasoning function, it is not clearly defined. That it may rep¬ 
resent an ability to grasp the essential steps involved in the 
solution of problems presented in quantitative or symbolic 
terms appears to be a plausible interpretation. 

Interesting to note is the fact that factor V! is loaded .39 
and .41 in the two tests Number Series and Pattern Analogies , 
respectively. A highly speculative interpretation would suggest 
that this factor may be that of induction previously identified 
by Thurstone (id). When the possible existence of an induction 
factor is taken into account along with the fact that the test of 
Pattern Analogies received an insignificant loading of .09 in the 
factor R, it appears even more plausible that the factor R may 
represent an ability to diagnose a problem expressed in quanti¬ 
tative terms. If the interpretation of the R factor is correct, a 
significant finding is that a test {General Reasoning ) can be 
constructed to measure quantitative thinking without the in¬ 
troduction of substantial amounts of variance in the numerical 
factor. 

Examination of the loadings for the final rotated factors I 
and VII in Table 4 reveals positive, though not conclusive, 
evidence for the existence of two reference variables which 
may be meaningfully identified as spatial-relations and visual¬ 
ization. In short, the two hypotheses as set forth are, in the 
main, upheld—at least to the extent that the factorial com¬ 
position of the two groups of selected tests differs. 

In the following list of four tests, the first three of which 
were selected to test the hypothesis relating to the psychological 
processes involved in visualization, loadings of .35 or higher in 
all rotated factors including I (Vz), VII (S), and VI (V 5 ) may 
be summarized as follows: 


Tests Factor I (V*) 

(n) Spatial Visualization .62 

(12) Punched Holes .52 

(14) Form Board ,52 

(5) Spatial Orientation ,42 


Other Factori 
.44S 


.36W 

•43W 

■58S 


U5S) 

(.22S) 
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In view of the presence of weights of .52 or higher in three 
tests Spatial Visualization , Punched Holes, and Form Board, 
(all three of which made up the group intended to represent a 
measure of visualization), factor I can be identified as visual¬ 
ization, even though the test of Spatial Visualization is loaded 
to the extent of .44 in factor VII (S). That the spatial-relations 
and visualization abilities may be required in one or more tests 
in either of the two groups of tests inserted in the battery was 
mentioned previously as a definite possibility. *■ 

After taking the test of Spatial Visualization, many of the 
subjects reported that in addition- to manipulating mentally 
the stimulus figure (an alarm clock) into the final position 
called for by the verbal directions, they also related the loca¬ 
tion of various parts of the stimulus object (hands, numerals, 
top, base, winding and setting mechanisms of the clock) to the 
location of corresponding parts of one or more response figures 
(five alarm clocks in different positions). In the easier items 
which required only one manipulation the role of spatial cues 
is undoubtedly important. On the other hand, in those items 
requiring two or three movements of the clock, it would ap¬ 
pear that a greater dependence was placed upon manipulations 
of the clock; in fact, in the most difficult items variance asso¬ 
ciated with reasoning, verbal, and memory factors would possi¬ 
bly be important. However, only four items requiring a se¬ 
quence of three movements were scored. Nevertheless, a small, 
though perhaps insignificant, loading of .25 appeared in the R 
factor. In short, the influence of the range of difficulty of items 
upon the factorial content of a test may be substantial, as a 
previous study has shown (6). 

In two other tests, Punched Holes and Form Board, which 
were weighted heavily in the visualization factor, small load¬ 
ings of .25 and .22, respectively, appear in the factor to be 
identified as spatial-relations. More important, however, are 
the corresponding loadings of 36 and .43 in a factor V 2 . Al¬ 
though not amenable to a dependable identification, this factor 
may be associated with the drawing (filling in) response re¬ 
quired of the examinees. Despite their relatively high satura¬ 
tions in the visualization factor, these two tests appear to in¬ 
volve additional unknown factors. 

The visualization factor loading of .42 in the test of Spatial. 
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Orientation , which was chosen to represent a measure of the 
spatial-relations factor, is probably indicative of the use by 
some of the examinees of visualization. Introspective reports 
from subjects differed as to the technique used in working the 
items. The variance representing \ isualization ability may be 
attributed to the tendency of several examinees mentally to 
manipulate the boat, as if it were a small toy, up and down 
and/or left or right and to imagine concomitant changes in the 
scenery. Many of the subjects rqiorted that they did not place 
themselves within the boat, but viewed the boat and scenery 
as if they were on a stationary platform some distance to the 
rear of the boat. One subject said that he pretended to be 
playing with a toy boat in a pond and to be sighting along the 
prow of the boat as a means of observing shifts in background 
scenery while he moved the boat with his hand to the right or 
left and/or up or down. 

On the other hand, many, if not most, of the subjects pre¬ 
tending actually to he inside the boat, and using the prow as 
the guide, noted changes in background views with reference to 
corresponding motions of the boat. Although the test of Spatial 
Orientation appears to be weighted in both spatial-relations and 
visualization factors, it does seem to represent best a measure 
of spatial relations or spatial orientation and to vindicate its in¬ 
clusion with other tests in the battery which were selected to 
bring out the spatial factor. 

In the following list of fwe tests, the first three of which were 
chosen to yield evidence regarding the second hypothesis, load¬ 
ings of .34 or higher were found in rotated factors VII (S), I 
(Vz), and V (Vi); 


Tnti 


Factor VII (S) Other Factors 


(5) Spatial Orientation .58 

(10) Flags .44 

(11) Cubes _ .4;) 

112) Spatial Visualization .44 

(13) Pattern Analogies .34 


.42 (Vz) 

(.IS Vz) 

(.20V2) 

,6aVz 

.41 V,(.24 Vz) 


The magnitude of the weights in factor VII for the tests of 
Spatial Orientation, Flags, and Cubes indicates that identifica¬ 
tion of the factor as spatial relations is psychologically meaning¬ 
ful. Despite the substantial loading of the visualization factor 
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in the test of Spatial Orientation —a fact which has been ration¬ 
alized previously—the first hypothesis regarding the psycho¬ 
logical nature of spatial relations appears to have been upheld. 

Of passing interest is the loading of .34 in the spatial rela¬ 
tions factor appearing in the test of Pattern Analogies. In this 
factorially complex test, the presence of variance in the spatial- 
relations factor may have been due to the role of those changes 
in the design of complex figures, or patterns, which depended 
upon a rule involving the spatial order of parts. In the more 
difficult items of complex design it was usually helpful, if not 
necessary, to give specific attention to the spatial organization 
of the various geometric properties within each of the patterns 
appearing in the row. 

A second source for possible variance in the spatial-relations 
factor was that of the format of each item. Pattern A and pat¬ 
tern B, which stood in a left-right order on the page, corres¬ 
ponded to the order of pattern C and one of the five alternative 
responses. Having been exposed to spatial tests administered 
earlier, many of the subjects may have transferred techniques 
previously learned in solving other items to the task required 
in the test of Punched Holes. Thus, the influence of mental set 
may have been one important reason for the appearance of the 
loading in the spatial-relations factor. 

The results of the factor analysis seem to indicate that, in 
the main, the two hypotheses have been upheld. Two of the 
final rotated factors may be readily interpreted in terms of 
their weights in two groups of tests as representing the spatial- 
relations and visualization abilities that were hypothesized. 
However, the number of tests does not appear to be large 
enough to determine with confidence whether the abilities may 
be correlated to some degree. 

Much needed, indeed, are other studies to yield further evi¬ 
dence regarding the tenability of these two hypotheses. Al¬ 
though two recent empirical investigations (1, 14) have in¬ 
dicated that similar primary factors are obtained when the 
same, or nearly the same, batteries of tests are administered to 
groups chosen under different selective conditions, it is urged 
that other homogeneous samples in which such variables as 
age, level of educational attainment, occupational ciassifica- 
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tion, and sex membership arc systematically varied he em¬ 
ployed to test the validity of the two hypotheses. Other hypoth¬ 
eses should be formulated regarding the psychological nature 
of the spatial domain and subjected to verification through use 
of specially devised tests and of other tests of known factorial 
composition. It is hoped that following more extensive research 
in the area of space and visualization relatively pure tests can 
be constructed* to measure the abilities identified and that 
such tests can be used with others of demonstrated merit to 
improve materially the degree of accuracy with which numerous 
complex criteria can be predicted. 

Summary 

The primary purpose of the study was to test the tenability 
of two hypotheses regarding the psychological nature of spatial- 
relations and visualization factors. A secondary purpose was to 
seek to identify certain factors found in the AAF investigations 
with certain of Thurstonc's primary abilities. Within a battery 
of fourteen tests, two groups of tests (three tests in each group) 
were included which appeared to reflect differences in the psy¬ 
chological processes associated with the spatial-relations and 
visualization abilities. In addition to the six tests expressly in¬ 
corporated within the battery to yield evidence regarding the 
validity of the hypotheses, eight reference tests of fairly well- 
known factorial content were included to aid in the identifica¬ 
tion of variance found in the six tests and to answer questions 
of identity of the Thurstone and AAF factors. 

Positive evidence for the hypotheses was to be considered 
attained if the two groups of tests defined separate factors and 
if none of the other eight tests was substantially weighted in 
factors unique to either group of tests. Moreover, none of the 
three tests in one group should contain large amounts of vari¬ 
ance in common with tests of the other group except to the 
extend that a given test might consist of items that reflected 
the presence of that factor which was defined in the main by 

* Even if pure rests cannot be constructed for #11 factors identified in the spatial 
realm, means are available for attaining estimates of univocal factor scores through 
use of suppression t«ts (8). 
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tests of the other group. If a test did appear in one group that 
contained variance in the factor associated primarily with tests 
of the other group, a satisfactory rationalization of this finding 
would be required. 

Product-moment correlations computed from sets of scores 
of 360 students in the introductory course in psychology at 
Rutgers University were factored by Thurstone’s centroid 
method. Eight of these factors were rotated by graphical means 
to positions satisfying the criteria of positive manifold and 
simple structure. 

In the orthogonal system six factors were identified as verbal 
comprehension, numerical facility, perceptual speed, reason¬ 
ing, visualization, and spatial relations. In the main, the vari¬ 
ances associated with factors identified as spatial relations and 
visualization were confined to the respective groups of tests 
initially placed within the battery to bring out the factors. In 
only one test in each group of three tests were substantial 
amounts of variance found in both the visualization and spatial- 
relations factors, although the larger portion of variance was in 
the factor common to the group in which that test appeared. 

The presence of variance in these two factors was ration¬ 
alized for each of the tests. Introspective reports of the sub¬ 
jects revealed that in many items the psychological processes 
used involved both spatial-relations and visualization abilities 
as described in the hypotheses The range of difficulty level of 
test items in one test also appeared to be an important reason 
for the appearance of two factors. 

In short, it may be concluded that the two hypotheses re¬ 
garding the psychological nature of visualization and spatial 
relations were confirmed. However, other research projects need 
to be carried out with a variety of samples before a dependable 
generalization can be made regarding the nature of these two 
abilities. Since there is some evidence of still other spatial 
abilities (3), some or all of which may be correlated, it is recom¬ 
mended that a conscientious attempt be made to formulate in 
operational terms new hypotheses and that new tests, having 
been constructed in harmony with the hypotheses, be factor 
analyzed along with other tests of established factorial con- 
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wsn Once the area of space has been dependably and ade¬ 
quately mapped, attention can be directed toward building 
tests approximating pure measures of the identified abilities 
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ON THE I'SK OF INTERACTIONS AS “ERROR TERMS” 
IN THE ANALYSIS OF VARIANCE' 

AUL-X L t-.mVARDS 
I'nivcf .jtv ttf Washington 

I. 

Many psyvholngicd and educational experiments are con¬ 
cerned xv i«li two or more variables, each of which may be varied 
in two or more ways. When the variables are studied in all pos¬ 
sible combinations in the same experiment, the experiment is 
said to be o fftutorial design. 5 As an example, let us take an ex¬ 
periment in which three variables are involved, A, B, and C. 
Suppose that A is varied in three ways, />' is varied in two ways, 
and C is varied in four ways. Then we shall have (3) (2.) (4) = 
24 combinations of variables, each combination corresponding 
to a particular experimental condition. One replication of the 
experiment will thus require 24 observations and the 23 degrees 
of freedom available with one replication would be allocated in 
the following way: 


Sum of squares A 

Main variables A 2 

B 1 

C n 3 

First order interactions; A X B 2 

A X C 6 

B X C 3 

Second order interactions: A X B X C 6 


If 240 subjects were available, then 10 could be assigned at 
random to each of the 24 experimental conditions. We would 
thus have 9 degrees of freedom within each of the experimental 
conditions or (9) (24) »- 2t6 degrees of freedom for the varia- 

1 This paper is bused u|xtn a section of # manuscript which deals more extensively 
with problems of experimental design in psychological and educational research. 
1 should like to acknowledge that ( have incorporated into this paper the suggestions 
of Dr. Paul Horst, who served as a technical consultant on the manuscript. 

•It is assumed that the render is familiar with the treatment of the analysis ot 
variance as given, for example, by Lindquist ( 6 ), McNemar ( 7 ), or Snedecor (8), 

«4 
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tion of subjects treated alike. The sum of squares for the 216 
degrees of freedom would be the pooled sums of squares within 
groups which would be used to derive the mean square for test¬ 
ing the significance of the main experimental variables, the 
first-order interactions, and the second-order interaction. 

In general, it may be said that whenever replication is present 
within the experimental design, the within-groups or mean 
square based upon replication is the appropriate error term 
against which to test the significance of all other mean squares. 
An exception to this rule, discussed in the next section, would 
be when the categories or classifications of one of the variables 
may be regarded as a random selection from the population 
being sampled. 

Let us assume that in the experiment described that the A 
variable corresponds to three instructors, the B variable to two 
methods of instruction, and the C variable corresponds to four 
schools. Each instructor teaches both methods and in each of 
the four schools. We shall assume that 60 subjects have been 
selected at random within each school to serve in the experi¬ 
mental groups The complete analysis of variance of achieve¬ 
ment scores on a standardized test given at the end of the ex¬ 
periment would result in the following sums of squares with 
associated degrees of freedom: 


Sum of squares df 

Instructors .. ..... .2 

Methods ... . . . 1 

Schools . .3 

Instructors X Methods 2 

Instructors X Schools . . 6 

Methods X Schools . .3 

Instructors X Methods X Schools . 6 

Residual within groups . . . 216 

Total . ,239 


Let us further assume that all of the mean squares, obtained 
by dividing the sums of squares by the corresponding degrees 
of freedom, are significant when tested against the residual 
mean square within groups. This would mean, first, with respect 
to the main variables: that significant differences are present 
among instructors; that the two methods differ significantly; 
and that there are significant differences among schools. 
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The interact bn between instructors and methods, if signifi¬ 
cant, would mean that the differences among instructors are 
dependent upon the method used or that the difference be¬ 
tween the methods depends upon the instructor variable. A 
significant interaction between instructors and schools would 
mean that the differences observed among the instructors are 
dependent upon the schools nr that the differences observed 
among schools are dependent upon the instructors. A significant 
methods and schools interaction would mean that the difference 
observed between the methods is dependent upon the schools or 
that the differences among schools arc dependent upon the 
method of instruction. 

If the second-order interaction is significant, this would mean 
that the differences observed among instructors are dependent 
upon the methods ami the schools; that the differences observed 
among the schools are dependent upon the instructors and the 
methods; or that the difference observed between the methods 
is dependent upon the schools and the instructors. 

Now, in view of a significant second-order interaction, our 
conclusions concerning the main variables consisting of schools, 
methods, and instructors, arc somewhat limited. We know that 
there are significant differences present for these three variables, 
but we know also, from the significance of the interaction, that 
the difference observed, let us say, for methods, is to some ex¬ 
tent dependent upon the schools and instructors. 

If our interest is only in the two particular methods, the three 
particular instructors, and the four particular schools, involved 
in the experiment, then our analysis and the tests of significance 
of the various mean squares, using the residua! mean square as 
an error term, are appropriate. Each mean square has been 
evaluated and the conclusions reached are definite. Examina¬ 
tion of the means for the various combinations of experimental 
conditions would probably reveal that in a particular school, 
one method is more effective than another, when used by a par¬ 
ticular instructor, and we could make recommendations ac¬ 
cordingly. 

II. 

In an experiment such as that described, however, our pri¬ 
mary interest may be in the difference observed between the 
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two methods of instruction which we have used. Furthermore, 
we may wish to make recommendations beyond the particular 
schools investigated. Can we say that a particular method will 
probably be more effective, on the average, for all schools, in¬ 
cluding those we have not actually investigated? 

Let us suppose that we have selected the instructors to repre¬ 
sent particular types or personalities or abilities. The three used 
in the experiment are definitely not a random sample from any 
defined population. Nor have we selected at random from any 
population of methods of instruction; instead, we have picked 
two particular methods for investigation. But it is possible that 
we might have made schools a random variable by selecting the 
schools at random from a defined population of schools for a 
given city, county, or school district. If this had been our in¬ 
tention, of course, we would undoubtedly have taken a larger 
sample than the four schools at hand. Let us suppose, however, 
that the schools have been selected at random. 

We now have the case mentioned earlier, where one of our 
variables may be considered a random sample from a defined 
population. In this sense the schools consist merely of replications 
of the experimental design in which the main variables are the 
instructors (varied according to type) and methods. Under this 
condition the highest-order interaction involving the random 
variable may be regarded as the appropriate error term for test¬ 
ing the significance of the next lower-order interactions. But 
before proceeding on this basis, another condition must hold 
true; the interaction must be significantly larger than the resid¬ 
ual mean square within groups, It cannot, of course, be smaller 
except by chance. If it is smaller, the residual mean square 
within groups should be used in testing the significance of the 
next level of interactions. 

Let us assume, in the present instance, that the second-order 
interaction is significant when tested against the mean square 
within groups. We now proceed to test the next level of inter¬ 
actions against the second-order interaction. Whichever ones 
of these prove not to be significant when tested against the 
second-order interaction may be combined with the second- 
order interaction to give us an error term based upon a larger 
number of degrees of freedom. 

Under the assumptions we have made, it is quite likely that 
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if the second-order interaction is significant when tested against 
the residual mean square within groups, that some of the simple 
interactions will not prove to he significant when tested against 
the second-order interaction. The obvious reason for this is that 
the mean square for the second-order interaction will be larger 
than the residual mean square within groups. The Ps thus ob¬ 
tained, besides being based upon a smaller number of degrees 
of freedom, will Ik* smaller than in the first instance. 

last us suppose that only the simple interaction involving in¬ 
structors and methods is significant when tested against the 
second-order interaction. The non-significance of the interac¬ 
tion between methods and schools and the interaction between 
instructors and schools, of course, means that we no longer have 
any basis for inferring that the difference observed between 
methods is de[>endent upon the schools, or that the differences 
observed among the schools are dependent upon the methods. 
Similarly, the evidence would now indicate that the differences 
among instructors are not dependent upon the schools, or that 
the differences among schools are not dependent upon the in¬ 
structors. The sums of squares for these two interactions may 
be pooled with the sum of squares for the second-order inter¬ 
action, along with their associated degrees of freedom. The 
analysis would now take this form: 

Sum of squares 

Instructors. 

Methods. 

Schools. 

Instructors X Methods 
Pooled interactions.... 

Residual within groups 
Total.... 

Now, how shall we test the significance of the mean squares 
for instructors, methods, and schools? If we could assume that 
either instructors or methods constituted a random sample from 
a population of instructors or a population of methods, the in¬ 
structor and methods interaction might be considered an ap¬ 
propriate error term for testing the significance of the mean 
square for instructors and the mean square for methods. This, 
however, is not a plausible assumption. The appropriate error 
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term is the pooled interaction mean square based upon 15 de¬ 
grees of freedom. It does include all of the interactions involving 
the variable which we have assumed to be randomly selected, 
schools. If we now test the mean squares for instructors, meth¬ 
ods, and schools, against the pooled interaction mean square, 
and, if they are significant, what conclusions can be drawn? 

It is the methods mean square that is of primary interest and 
its significance would indicate that the difference between meth¬ 
ods was not dependent upon, or could not be accounted for, in 
terms of differences in the schools. A similar statement could be 
made concerning the instructors if this mean square was sig¬ 
nificant. In view of a significant interaction between methods 
and instructors, however, it would still be necessary to qualify 
our recommendations; the difference between the methods is 
still dependent upon the instructors. But the means foi the 
various instructors teaching the various methods could be ex¬ 
amined for whatever insight this might give us as to the nature 
of the interaction 3 

The analysis we have described is dependent upon a number 
of considerations and these should perhaps be emphasized once 
more. If the interaction or pooled interaction mean square is to 
be used as an error term instead of the residual mean square 
within groups, it should be larger than the residual mean square. 
If it is smaller, it is so only by chance. Furthermore, it is neces¬ 
sary that the categories of one of the variables in the experi¬ 
mental design be a random selection from the population being 
sampled 4 . In the experiment discussed, for example, it would be 
necessary for the schools to be selected at random from a defined 
population of schools. In this case, the categories of the ran¬ 
domly selected variable may be regarded as replications of the 
experiment, and there is some justification for the use of the 


5 What if all of the first-order Interactions had proved to be significant when tested 
against the second-order interaction? In this case, the interaction between methods 
and schools might be used to test the significance of the methods mean square, and 
the interaction between instructors and schools might be used to test the significance 
of the mean square for instructors. We should keep in mind that in following this 
procedure, our interest is in being able to generalize concerning the methods, for 
example, in the population of schools. 

‘This condition will not be met by argument after the experiment has been earned 
through to completion For example, it would be illogical to argue that the two particu¬ 
lar methods of instruction selected for investigation have been randomly selected from, 
a population of methods. 



aao EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


interaction as an error term, instead of the residual mean 
square*. 

III. 

In some complex experiments, involving many possible com¬ 
binations of experimental variables and consequently many ex¬ 
perimental conditions, replication is not used and the sums of 
squares for the higher-order interactions are pooled, along with 
the degrees of freedom associated with them, to obtain an esti¬ 
mate of experimental error (residual mean square within 
groups). The mean square thus arrived at is used in the manner 
in which the mean square based upon the variation within 
groups has been used in the experiment described, i.e., as an 
estimate of the uncontrolled variation against which to test the 
significance of the other mean squares. 

An example of this design is to he found in an experiment by 
Crutchfield (3), in which five variables were each varied in three 
ways in an investigation of 1 ' behavior potentials.” Animals were 
placed in a pulling compartment in which there was a string 
arranged by pulleys to a food pan. By pulling on the string the 
animals could pull the food pan next to the compartment and 
thus eat, A friction device was used to increase or decrease the 
force required for pulling the food pan, and behavior was stud¬ 
ied under all possible combinations of the experimental vari¬ 
ables. 

Variable A was the length of the string attached to the food 
pan and this was varied by the use of 60 cm., iao cm., and 240 
cm. lengths. Variable B was the force required to pull the food 
pan in on the training trials and this was varied by using a low, 
medium, and high setting of the friction device. Variable C was 
the number of training trials given the animals and this was var¬ 
ied by giving 30, 60, and 90 trials. Variable D consisted of the 
number of hours between the crucial test trial and the last 
feeding period. This -was varied with intervals of 12 hours, 24 
hours, and 48 hours. The final variable, E, was the force re- 

«This is the situation in experiments involving repented measurements on the 
same subjects, where the interacnons involving Mibjcus are used to provide an estimate 
of experimental error under the assumption time the subjects have been randomly 
selected from a defined population. Some of these experimental designs are described 
by Grant (4), Broxek and Alexander (a) and Kogan ($), 
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quired to pull the food pan during the crucial test trial and this 
was varied in the same ways as during the training trials. 

By varying each of the five variables in three ways, a total 
of 3 6 = 243 combinations of the variables are possible. One 
replication of the experiment, assigning one animal to each ex¬ 
perimental condition, would thus require a total of 243 animals. 
Each additional replication would require another 243 animals. 
Crutchfield decided to forego any additional replications and to 
use as an error term a mean square based upon the higher-order 
interactions. 

Each of the experimental variables will be based upon 2 de¬ 
grees of fieedom, accounting for a total of 10 degrees of freedom. 
The first-order interactions will each be based upon 4 degrees of 
freedom, accounting for a total of 40 degrees of freedom. The 
second-order interactions, each based upon 8 degrees of free¬ 
dom, will account for 80 degrees of freedom; the third-order 
interactions, each based upon 16 degrees of freedom, will account 
for 80 degrees of freedom, and the remaining 32 degrees of 
freedom will be associated with the fourth-order interaction. 
Crutchfield pooled the sums of squares for all interactions be¬ 
yond the first-order along with their degrees of freedom to ob¬ 
tain as his estimate of experimental error a pooled interaction 
mean square based upon 192 degrees of freedom. 

IV. 

Assumptions are involved, of course, in the pooling of the 
sums of squares for higher-order] interactions and their asso¬ 
ciated degrees of freedom In the first place, it is assumed that 
each of the mean squares corresponding to the higher-order 
interactions is an estimate of the same common population 
variance, i.e., the assumption of homogeneity of variance is in¬ 
volved. It is also assumed that this common variance would not 
differ significantly from the variance estimate obtained with 
replication. If the higher-order interactions are not significant— 
and without replication and a corresponding test of significance 
this must remain an assumption—then the mean square de¬ 
rived from these interactions will estimate the same variance as 
estimated by the mean square within groups. 

Under these conditions, the experimental variables, A , B, C, 
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D, anti A', may be tested for significance by the mean square 
based upon the higher order interactions. The significance of the 
first-order interactions may be tested in the same manner. If 
none of the first-order interactions is significant, this provides 
good evidence that none of the higher-order interactions will 
be significant and therefore justifies the use of the higher-order 
interactions as an error term. 

Let us suppose, however, that one of the first-order inter¬ 
actions, let us say, the interaction between variable A and vari¬ 
able B, turns out to be highly significant. If that is the case, then 
the mean square based upon the pooled sum of squares for all 
higher-order interactions is likely to be biased in the direction 
of overestimating the “pure" experimental error that would 
have been obtained from replication of the experiment. 

If the first-order interaction between A and B is significant, 
we should then isolate the sums of squares for the second-order 
interactions which involved these two variables. These sec¬ 
ond-order interactions would b cAxBxC, AxBxD, 
and A X B X E. These sums of squares and their associated 
degrees of freedom would be subtracted from the pooled sum of 
squares and degrees of freedom for all higher-order interactions. 
Since each of the second-order interactions is based upon 8 
degrees of freedom, then the subtraction of the three second- 
order interactions mentioned would leave a pooled sum of 
squares based upon 168 degrees of freedom. The significance of 
the three second-order interactions in question could then be 
tested against the residual mean square based upon 168 degrees 
of freedom. 

It has been mentioned that homogeneity of variance of the 
higher-order interaction mean squares is also involved in pool¬ 
ing them to obtain a single estimate of experimental error. Each 
of the mean squares based upon a higher-order interaction 
might be found and the set tested for homogeneity of variance 
by means of Bartlett's test (1). If the test of this hypothesis does 
not result in the rejection of the hypothesis of a common vari¬ 
ance, then the pooling of the various sums of squares and de¬ 
grees of freedom is proper. 

Although the procedure of using interactions as estimates of 
experimental error ha3 been followed in much published re- 
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search, we should keep in mind that there is no substitute for 
replication. If there is an a priori reason for expecting inter¬ 
actions to be significant, a test, based upon replication, should 
be provided in the design of the experiment. If the interaction 
mean squares are significant, then their use as an estimate of 
the mean square that would have been obtained with replica¬ 
tion, the within-groups mean square, may result in an under- 
evaluation of the significance of the main experimental vari¬ 
ables. 
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THE OBJECTIVE MEASUREMENT OF 
DYNAMIC TRAITS 

R. B. CATTFJEX, A, B, HEIST, P, A, HEIST and, R. G, STEWART 

'The Ergic Theory of Altitude Measurement 

It is disconcerting that psychologists have not yet found any 
more objective way of measuring an individual’s attitudes and 
interests than by asking him how strong they are. In 1935 the 
present writer demonstrated some degree of validity in measures 
of spontaneous attention and of memory, for matters of interest 
(3). But, apart from the work of Super (17) and one or two 
sporadic, incidental uses of these newer methods, the bulk of 
research has continual to concentrate on refinements of verbal, 
self-declaratory attitude and interest scales (12, 14), which, in 
the writer's opinion, can never satisfy the need fot scientific, 
behavioral objectivity and meaning. Even the applied psy¬ 
chologists working with polls and socio-economic attitudes have 
regretfully had to realize that what a man says is unpred'ictably 
different from what he does and sometimes, indeed, from what 
he said an hour before (14). The present research, and two 
studies reported elsewhere (K, 9), arc attempts to follow up on 
a more adequate scale, and to expand in new directions the 
original statement (3) of design for objective interest measure¬ 
ment. 

Dynamic traits are divisible into ergs, or basic innate drives, 
on the one hand, and metanergs, or attitudes and sentiments, 
on the other (4, 5). The present study is concerned with 
attitudes, but, since the attitude is, in respect to modes of 
measurement, a prototype of all dynamic traits, the methods 
developed here have reference, and are applicable to, dynamic 
traits generally, 

An attitude needs to be defined initially by five aspects, 
which are summarized in the paradigm; 

“(1) In these circumstances (a) I (3) want so much (4) to do 
this (5) with that." 
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Here (i) defines the stimulus situation with reference to 
which the attitude is evoked, (2) the organism bearing the 
attitude, (3) strength of interest in the course of action indi¬ 
cated, (4) the kind of action indicated and (5) the object with 
which the attitude is connected. Sometimes (1) and (5) are 
the same. 

According to the ergic theory of attitude measurement (5) 
an attitude may be expressed, for purposes of analysis and 
calculation, as a vector quantity, in which the length of the 
vector represents the strength of desire for (interest in) the 
defined course of action, and its direction represents its dynamic 
composition. It assumes that ergic coordinates can be discovered 
and defined by appropriate factor analytic procedures so that 
by giving the direction of the attitude with respect to these 
coordinates we describe the extent to which various ergs, e.g , 
hunger, sex, self assertion, pugnacity, gain expression through 
the attitude in question. An attitude is thus not regarded, by 
the ergic theory, as adequately expressed by the existing con¬ 
vention of pro- and con- an object; for an attitude about an 
object is far richer than a single dimension can express and is 
better defined in terms of all those basic-drive satisfactions 
which the given action to the object produces One can, of 
course, correctly speak of a pro-con scale with respect to a 
defined course of action, i e , one already defined in direction, as 
above. But a person may utilize the same object for many 
different courses of action, so that for this reason, as well as 
because of the possibility of fuller understanding given by 
expressing the ergic composition of the course of action, it is 
psychologically meaningless to speak of being “pro” or "con” 
an object 

The above discussion of basic theory is necessary if the 
meaning of the present experiments is to be understood and 
their findings properly applied. It leads to a formula for the 
strength of an attitude parallel to that used in the specification 
equation for expressing some particular skill in terms of primary 
abilities, (4) as follows 1 

Li = SijEu + SyEu + ■ ■ • S n ,E n i + SjEj, 
where I is the strength of interest of the individual i in the 
course of action defined by the attitude/ The S ’s are the factor 
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loadings, which In this case we shall call thedynamic situational 
indices defining the extent to which the various ergs or drives 
Ex, £*, etc., are involved (for the average member of the 
population) in determining the course of action concerned. It 
is our purpose to measure /, the strength of interest, by more 
objective methods. The measurement of the ,V’s, i.e., the direc¬ 
tions of the attitude vectors, is described elsewhere (5, 8). The 
measurement of the strength of an attitude is thus a measure¬ 
ment of interest. An altitude is measured when we measure both 
interest and ergic composition, i.c., length and direction of the 
vector. 

Possible Approaches to Objective Measurement of Dynamic traits 

Considering an attitude as a dynamic trait, it is easy to 
perceive, from what is already known about psychodynamics, 
that there is a wide array of possible principles for the objective 
measurement of attitude strengths. The following will be briefly 
discussed here and the majority, those starred, will have their 
application to experiments described precisely, 

A. Criterion Methods , (a) Interactive.- -By these are meant 
methods of measurement too long and difficult for routine test 
use, but which, when properly applied (19), supply data that can 
be taken as a true measurement of what is meant by interest- 
in objective, u interactive" (4) units- in the real life situation. 

* (1) Money. Fraction or absolute amount of the indi¬ 

vidual's income that he spends on certain courses of 
action. 

* (a) Time. Fraction of the individual’s time that he gives 

to certain courses of action. (18) 

B. Criterion Methods, (b) Solipsistic .—-By these are meant 
methods of measurement dependent on introspection and self 
assessment but which, in the specially controlled circumstances 
of experiment with intelligent, cooperative subjects, can be 
used as criterion data, 

* (3) The classical‘'opinionaire” method, as used by Thur- 

stone (cto) and others. 

» * (4) The " preference ” method, in which the individual is 
presented with alternate courses of action (attitudes) 
and asked which he would prefer to satisfy. This is 
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done in all possible paired comparisons among, in 
this case, 50 attitudes, and thus supplies a more thor¬ 
ough, pointed measure along the lines of (3). It is the 
same situation for human beings as that presented to 
animals in the classical “choice box” experiment on 
motivation strength (ai) except that the reaction is a 
verbal only. 

C. Attention-Memory ( Learning) Methods; (a) in the immedi¬ 
ate situation. —These depend on the principle that interest (in¬ 
centive) is a determiner of attention, rate of learning, inhibitory 
effects on other processes, etc , and seeks to measure interest 
through such effects. 

(5) Attention time Recording the length of time or the 
rank order in which the individual will spontaneously 
attend to various stimuli. 

* (6) Immediate Memory. Since there seems little point, as 

far as we know, in separating measures of “observa¬ 
tion” from “immediate memory,” this records instead 
of “attention” the amount of various interest data 
recalled almost immediately after exposure. As indi¬ 
cated later, the measure was tried separately for state¬ 
ments facilitating the expression of the attitude and 
statements frustrating it. 

(7) Reminiscence. It would seem likely that reminiscence, 
the selective action of memory as determined by con¬ 
trasting immediate with more remote recollection, 
might be particularly correlated with interest. 

* (8) Distraction. This method aims at measuring the atten¬ 

tion effect indirectly by recording the failure to per¬ 
ceive surrounding material when the interesting object 
is presented. 

(9) Retro-active Inhibition. As with distraction, the interest 
an individual has for certain matters, particularly in 
the deeper interests, might be validly measured by the 
amount of retro-active inhibition their consideration 
exerts upon some prior, standard learning process. 

D. Methods Appraising Cognitive and Dynamic Structure due 
to Interests. —The methods under C depend on learning effects 
of interest in the immediate test situation , bijt if we are willing 
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to accept the slight error due to time lag we can measure 
interests alternatively by the effects they have had in the course 
of lime upon information, skills and dynamic response habits, 

* (to) Information. This method tests the individual’s infor¬ 
mation about facts, devices, etc., necessary to imple¬ 
ment the course of action in which he is interested (not 
necessarily knowledge about the object). 

*(lt) Speed of Decision (Reaction time). This method as¬ 
sumes that decisions will he given more quickly for 
questions in regard to which the individual has more 
intense conviction. Preliminary work already indicates 
the probability of this. 1 

(l2) Level of Skills. The extent of the built-up skills in a 
certain course of action may. like the level of informa¬ 
tion, provide a measure of tire strength of interest 
therein, e.g., performance on a piano provides an index 
of musical interests, or skill in shooting of hunting 
interests. Time and errors in suitably chosen diag¬ 
nostic performances would thus provide a measure of 
this area. So also might speed of decision in a different 
context from (1) above, namely in that there would 
be, through practice, greater quickness in making de¬ 
cisions in those fields with which S is familiar. 

E. Autism Methods. In research on so-called ‘'projective" 
tests the present writer has pointed out (7) that devices in this 
area are more aptly called apperception tests (since such meas¬ 
ures include both cognitive and dynamic sources of distortion), 
Within the apperceptive class, however, we may distinguish 
autism tests, which deal with distortions of perception, reasoning 
and memory through dynamic traits alone. Ego defense dynamisms 
tests are a sub-category within autism tests. The autism methods 


1 Chant and Salter (to), presenting an "attitude to war" opinionaire to a group of 
mainly pacifist subjects, found that items which demanded /ower decision had a larger 
P.Ci.K, (0.7a tfe ,07), but that more "militaristic" items had larger P.Ci.R. and more 
neutral items a longer decision time (i.e., curvilinear relationships exist). What bears 
more simply on our approach is their finding that rtjtcted mttmcuit had larger del)ac¬ 
tions (.71 sfc ,16) and longer decision times. (Mean a,6 ± ,ofl greater than accepted). 
At the reading of the present paper at the annual Mid-Western A.P.A, meeting in 
Chicago *940, Gailcnbcck (i.p announced that he had results, but more finely analyzed, 
entirely confirming the relation between affirmative decision times and strength of 
convictions, presented here. The results of Postman (tj) are also in agreement with 
this use of decision time as a strength of attitude measure. 
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used to measure the dynamic traits of special significance to 
personality are obviously applicable to interests in general, 
though the defense dynamism tests are not so relevant. 

* (13) Misperception (Perceptual Autism or Illusion). —In 

this method defective sensory presentations (mainly of 
words) are made such that the individual may be 
tempted to apperceive them in accordance with his 
wishes. He is scored on the number misperceived to 
fit in with his attitude. 

* (14) False Belief (Reasoning Autism or Delusion). —The 

method presents a number of manipulatable state¬ 
ments of fact and logic so chosen that the individual 
with a strong attitude will experience a need to distort 
his factual beliefs in a certain direction better to 
support his attitude. 

* (15) Phantasy. This method treats phantasy in toto and 

not merely the defense dynamism forms. A measure of 
time spent phantasying or of choice of phantasy read¬ 
ing in presented alternatives is recorded. 

* (16) Projection (Defense dynamism). Two types of con¬ 

trolled, selective answer tests are possible in this area, 
(a) That in which the picture or the verbal statement 
of activity is fixed and the subject selects the best of 
the alternative dynamic “explanation” of the behavior 
(See design in (9)). (b) That in which the subject 
chooses the activities, from a presented list, of which 
he prefers to “explain” the motive. The latter is 
psychologically more complex but has not been tried 
and it was the especial interest of one co-worker to 
try it out here. (9) 

(17) Ego Defense Dynamisms. It is possible that any other 
defense dynamism, e.g., reaction formation, identifi¬ 
cation, rationalization, true projection, defensive 
phantasy could be used here, by methods described 
elsewhere (7), but such methods would be restricted 
by applying only to interests connected with ego con¬ 
flicts and were not tried out at this stage of ex¬ 
ploration. 

F. Activity Level Methods, (a) Psychological. —In this cate- 
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gory, which includes some relatively miscellaneous approaches 
we include attempts to measure increases in the general ex¬ 
citement level of the organism due to arousal of interest by the 
stimulus in the experiment. 

* (18) Fluency. A measure of the sheer amount written, in a 

given time, in a “completion” test of statements con¬ 
cerning a given attitude. 

* (19) Speed of Reading. A method based on the hypothesis 

that an individual will read more rapidly material 
which interests him and which is in agreement with 
his own attitudes. 

* (20) (Fork-Endurance Measures. This method plans to 

measure work output (endurance of fatigue) or en¬ 
durance of pain or discomfort in the interest of various 
attitudes and is thus analogous to the obstruction 
method in animat motivation studies (21). Miniature 
situations involving satisfaction of the particular at¬ 
titudes could be made, for example, in terms of satis¬ 
faction of curiosity in reading about facts contribu¬ 
tory to the total attitude satisfaction. 

G, Activity Uvtl Methods , (b) Physiological.—The. known, 
promising methods of measuring increase in activity level are 
greater in the physiological field, where autonomic and meta¬ 
bolic measures have been more developed. 

* (21) Psychogalvanic Response. The percentage decrease in 

resistance was measured on exposure of statements 
favoring and opposing the given attitude. 

* (22) Pulse Rate. Difference of rate before and after pres¬ 

entation of stimulus defining attitude. 

(23) Metabolic Rate. A better measure, to which the above 
is only an approximation, would be the increase in 
metabolic rate following, in a discovered optimum 
period, the presentation of the attitude statements. 
Because of technical difficulties we had to be content 
with (aa), 

(24) Muscle tension. There is evidence in the work of Duffy 
that general muscle tension is as sensitive and reliable 
a measure of conation as is the P.G.R. For lack of 
further work confirming the measurement of conation 
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by this method, however, we eventually did not use 
general tension, but (25) below. 

* (25) Writing Pressure. The subject was asked to write 
"Yes” or “No” according to his reaction to presented 
attitude statements. A device beneath the writing 
desk measured the handwriting pressure he exerted in 
these responses. 

Twenty-five distinct methods of objective attitude measure¬ 
ments are suggested, above, to be of promise; but nine of 
them—(5), (7), (9), (12), (15), (17), (20), (23) and (24)—were 
not tried in the present experiment, some because of special 
technical difficulties, some because of similarity to methods 
already in the sample and some, namely (5), (15), and (17), 
because an idea of their effectiveness has already been gained 
from earlier research (3), (7), (11). Of the sixteen methods 
tried, twelve are described here and the rest elsewhere (9). 

The Experimental Design 

The proof of goodness of an attitude measurement method is 
valuable only if it applies to any kind of attitude. Consequently, 
it was our objective to design the experiment so that a wide 
range of methods could be applied to a sufficient sample of a 
wide range of attitudes. Twelve attitudes were taken, sampled 
from (1) those of massive importance in everyday life (and 
therefore of interest to clinicians), from (2) those sampling 
distinct basic drives and (3) those of different social and intel¬ 
lectual interest areas (such as have been of interest to social 
psychologists). The list was based mainly on the fifteen cate¬ 
gories of Cattell’s Interest Pest (6). 

The twelve attitudes chosen for experiment with the various 
measurement methods here described were actually adminis¬ 
tered to the group in a total set of fifty attitudes, in connection 
with an experiment described elsewhere (8). This inclusion in a 
large group gave certain advantages, notably, that the prefer¬ 
ence score could be the rank order in fifty attitudes rather than 
in twelve. The twelve attitudes are set out below according to 
their index numbers among the fifty (8). 

(1) I want to play more indoor sociable games, such as card 
games, 
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(2) 1 want to spend somewhat more on drinking and smoking 
than I am now able to do. B 

(6) I want to become proficient -if passible to excel my 
colleagues—'in my chosen career. } 

(10) I want more time to enjoy sleep and rest. 

(n) I want to listen to music. 

(16) I want to know more science. 

(19) 1 want to see organized religion maintain or increase its 
influence. 

(22) 1 want to attend football games and follow the fate of 
teams. 

(,p) I like to see a good movie or play every week or so. 

(34) I want to get my wife the clothes she likes and to save her 
from the more toilsome household drudgeries, 

(36) 1 want to be smartly dressed, svitb a personal appearance 
that commands admiration. 

(44) 1 want to fee! that 1 am in touch with CJod, or some prin¬ 
ciple in the universe that gives meaning and help in my 
struggles. 

Upon these twelve attitudes the twelve methods of measure¬ 
ment set out below were tried. Four methods- (4), (6), (10) 
and (21) -were tried on all attitudes; two methods—(1) and 
(2) - were tried cm seven attitudes; and the remaining, newer 
methods were tried cm one attitude each. 

Brie/ List of Methods Examined Here (Entirely New 
Methods in Italics) 

(1) Money expended 

(а) Time expended 
(4.) Preferences 

(б) Immediate Memory 
( 3 ) Distraction 

(10) Information 

(n) Speed of Decision 

(n) Misperception ( Illusion) 

(14) False Belief ( Delusion) 

(18) Fluency 

(19) Speed of Reading 

(it) Psychogalvanic Response 

It was our aim to measure validity in terms of correlation with 
the pooled result of all methods . But from existing information it 
is likely that some methods are better than others and, indeed, 
six of the above methods, those in italics, are “long shots” with 
no previous work on them whatever; so we decided to make the 
validating core out of the first six—hereinafter designated 
“tried” tests, because previous work has shown (1), (3), (n), 
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(16), (17), (18), (19) some degree of validity. Also, we desired 
to know the relative goodness of these first six tried tests with 
greater accuracy, whereas we were interested only as to whether 
there exists any validity at all in the exploratory (italicised) 
tests. It was for this reason that the tried test methods were 
applied to the majority of attitudes, but each of the exploratory 
methods was tried on one attitude only. 

The subjects were a population homogeneous as to sex (men) 
and chosen to have family interests (all were married) but 
otherwise diverse (some students, some business men) and 
ranging in age from 20-40 (80 per cent between 25 and 33) so 
that though all possessed the attitudes in question they would 
do so in diverse degrees Six methods (the “tried’' methods) 
were applied to all subjects but not on all attitudes, for each 40 
subjects took a different pair of attitudes. The six exploratory 
methods were therefore each applied only to one attitude and 
40 subjects. 

A more detailed statement of the method of administration 
of the twelve methods follows- 

(1) Money Expended —(No. I in general list; used on all 
attitudes.)—Two weeks a month apart and clear of any special 
holiday season, were taken and £ was asked to record his 
expenditure on the particular interest activity concerned for 
the whole week. Reliability coefficients were calculated with 
respect to the two-week periods. 

(2) ‘Time Expended —(No. 2 in general list; used on all atti¬ 
tudes.)—In the same two weeks d recorded separately for each 
and at the time the number of hours spent in the given ac¬ 
tivity interest (See (18).) 

(3) Preference —(No. 4 in general list above; used on all 
attitudes.)—A matrix of cells was constructed, constituted by 
the triangular area bounded by the full fifty attitudes arranged 
in rows on the right and in columns from right to left. Each 
cell thus represented a possible comparison of the strength of 
one attitude with that of another. S thus made 1225 paired 
comparisons, indicating in each case which of the two attitude 
goals concerned in the comparison he would rather satisfy. The 
score for a given attitude was the fraction of the 49 compari¬ 
sons in which it was the preferred member. 

(4) Immediate Memory —(No. 6 in above list; used on all 
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attitudes.) A scries of 5 00 brief statements, xo to an attitude 
equally divided among those pro and con each of the attitudes' 
were presented tachistoscopically at 6-sccond intervals. They 
were presented in a series of 42 discs, each consisting of 1% 
statements randomly mixed from among the fifty attitudes. As 
examples of the five pro or facilitating and five frustrating 
stimuli used in connection with each attitude we may take 
from attitude 6 (wanting success in one’s career) the two state¬ 
ments “Success in career assures happiness” and “The success¬ 
ful careerman is always selfish.” S was told at the beginning 
that after every u statements (and a pause of 25 seconds) he 
would be asked to recall, in 30 seconds, all that he could re¬ 
member of "the phrases, statements or ideas presented in the 
last period.” Credit for recall was given when the essential idea 
of the item was reiterated regardless of verbal form, This 
same situation and set of attitudes was used simultaneously to 
get the P. G. R. responses described below, 

(5) Distraction (No. K in above list; used on attitudes 36 
through 40.) Statements similar to the above were exposed, 
ten to each attitude but intermixed. $ was told he would be 
given 10 seconds to look at each statement and might be asked 
to repeat it (he was asked intermittently) as well as to recall 
the nonsense syllables scattered around the statement, Twelve 
or thirteen nonsense syllables were in the margins around each 
statement. S was given 10 seconds to write down above re¬ 
called items. 

(6) Information- - (No. 10 in general list; used on all atti¬ 
tudes .)—-“Ten information items, each with multiple-choice 
selective answers, were presented for each attitude. The infor¬ 
mation dealt, not with the object (which would measure total 
interest in the object) but with knowledge required in following 
the course of action connected with the attitude. S was asked to 
leave no item unanswered but to guess. Scored on total number 
right. A typical example may be taken from attitude 22, on 
wanting to follow football games as a spectator: 

Orange) (Georgia 

"In thqSugar fBowI game of January 1st, 1948^Michigan • 
^Cotton] (S. M. U, ( 

j c ^ ,/Alabama ) 
d ' feattd \CaUfor„ia/." 
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(7) Speed of Decision —(No 11 in general list; used on at¬ 
titude No. 1.)—Ten questions were presented for each attitude. 
They were chosen to be such that all J’s would give some degree 
of affirmative answer, and S’ s were told to give an answer in the 
form "Probably/’ "Yes” or "Certainly,” i.e., definitely and 
emphatically yes. For example, “Do you want the sale of liquor 
to children to be prohibited?” This uni-directional response was 
necessary because previous research has indicated (15) that a 
short decision time is associated both with very affirmative and 
very negative responses. We need a question such that reaction 
time would work only in one direction. 

(8) Misperception (Illusion) —(No. 13 in general list; used on 
attitude No. 2.)—Ten attitudes statements, positively ex¬ 
pressing the attitude, were presented for each attitude. >? was 
instructed to expect 1 second tachistoscopic exposures of sen¬ 
tences, to repeat what they said and to note any misspellings. 
Sentences were such as "I want to eat a chocholate sundea,” 
"I want to reduse my weight thruogh work.” Ten statements 
not connected with any dynamic need were presented as a 
control on <S”s normal carefulness of spelling perception. 

(9) False Belief {Delusion) —(No. 14 in general list; used on 
attitudes 41, 42., 43 and 44.) —Ten statements for each attitude 
were presented S as an "Information Test.” The five 
multiple-choice alternative factual endings to each statement 
were such as to give greater or less factual support to the 
attitude S might desire to maintain. Thus on attitude 44, 
“During the war church attendance increased greatly and since 
V-J day it has (declined slightly; tended to increase still more; 
stayed at its high peak; returned to its pre-war level; fallen to 
its lowest point since 1920). 

(10) Fluency —(No. 18 in general list; attitudes 31, 32,33, 34, 
35.)—tS 1 was shown each of the ten statements originally used 
to express each attitude and was told to write as much on the 
topic of each as possible in 1 minute. It was noted that this 
‘fluency’ increased slightly but steadily in successive attitudes, 
so £ was run through attitudes in both direction. At this 
administration no check was kept of relative fluency on pro and 
con statements. Score was total number of words produced. 

(11) Speed of Reading —(No. 19 in general list; tried on 
attitudes 14, 15, 16, and 17).— Six statements were presented 
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for each attitude, three favoring the attitude and three against 
it, but in random order. S was timed on reading statements 
aloud, the negative item speed being subtracted from the 
positive on the assumption that P would read more rapidly 
those statements which expressed his desires, 
ft a) Magnitude of Psychogalvanic Response—{ No. at in 
general list; tried on all attitudes). - The P.G.R, was applied 
with the technical conditions described in earlier work by the 
senior author fa), the deflection being measured in percentage 
of the absolute resistance. For each attitude the deflection was 
taken to tachistoscopic exposures of five statements favoring 
the attitude and five opposing it, the instructions and exposed 
statements being those used in die Immediate Memory Test, 

TABLE i 

RehaMifi/t e>f "Tried" Met hah 


Aitilwfo Number* thro 1 

I 1 « 19 II 16 19 n JO Si AS M Zicorc 


(jo) Information .64 .„T) ->3 .Jo ,86 ,14 .68 .<ji .90 .31 .36 .jo .59 

(11) Pisychagfilvtit M .Jl .«i .9* .</< .84 .88 .9.1 .96 .80 .70 .63 .8$ 

( j ) Preference .70 ,89 .88 .88 ,88 .91 .96 .87 -67 . 9 a -9J .98 .90 

(ft) Imated. Memory ,,]i .13 .47 .93 i -It -8 fi -47 .JI -44 * 5 ° 

(1) Time Exp, -9* / ..1* .96 -97 / >9* .99 .7> / -94 / >94 

(1) Money Exp. / M .48 ,98 ,96 / -99 .99 .96 .#4 -67 / -94 

* These c&rrwmmd to the numberx in the complete description of fifty attitude 
n (»). 

Scoring was carried out for facilitating and frustrating sets 
separately and also for all together, as discussed below. 

Results 

As indicated above, the measurement of each attitude was 
split wherever possible into two sets of five items, in order to 
get a reliability; but, where the measures had first to be split 
into pro and con items, the reliability was reduced to two items 
against three, 

The reliabilities for test forms applied to all twelve attitudes 
and corrected to lo-kem length tire as shown in Table I. 

For Immediate Memory with unfavorable statements (At¬ 
titudes 6 and 10) the reliability was .31; for facilitating state¬ 
ments, ,45; for the Distraction measure (Att, 36), .64; for Speed 
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of Reading (Att. 16), .79; for Misperception (Att. 2), .43; for 
False Belief (Att. 44), .53, for Fluency (Att 34)5 .68; and for 
Speed of Decision (Att. 1), .90. Apart from the methods of 
comparing the speed of reading of favorable and unfavorable 
views and the method of misperception of spelling, therefore, 
any failure of a method to attain recognizable validity cannot 
be imputed to any large extent to unreliability of the tests. 
These two methods, as well as the immediate memory method, 
however, evidently need improvement in items and procedure, 
to gain reliability sufficient for a more exact appraisal of 
validity. Information and P.G R. could also be improved on 
certain attitudes which offer specific difficulties m test item 
design. For example, Attitude 10, “I want more time to enjoy 
sleep and rest, 5 ’ evidently makes severe demands upon the 
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experimenter’s subtlety in choosing information items con¬ 
nected with this interest, for the ten items used attained a split 
half reliability of only .20. 

For the six methods used on all twelve attitudes, twelve 
correlation matrices were worked out, and averaged (cell by 
cell), by Fisher’s Z function, to give the values in Table 2 
for the mean intercorrelation of the different methods applied 
to a representative set of attitudes. 

No factor analysis has been attempted on so few variables, 
but what is substantially the loading of each method in the 
first general factor has been indicated by averaging its correla¬ 
tions with all other methods This “internal validity” we shall 
take as the best basis for deciding the relative validities of the 
various methods. 
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The calculation of the standard error on these correlations is 
somewhat complex. Each r in the body of the matrix is the 
mean of eight to twelve r’s on 40 men each. Since they are 
averaged through Fisher's function the standard error of each 
would have \ZA\3 in the denominator, so that the standard 
error of the mean would he equivalent to an r on a population 
of between (A" 3) x 8 and (,Y.31 \ 1 i, i.e., agfi to 414. However 
the validation r’s are each the mean of five r’s each with the 
above standard error. The fact that the five latter represent 
independent experiments but not independent groups creates 
some difficulty, but assuming independence through experiment 
and applying the A’-j denominator we arrive at from 1480 to 
2.070 cases as the population on which the r’s in Row 1 are 
based, On this basis the validities of methods r, a, 3 and 4 are 
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significant at the i per cent level, 6 at between i and 5 per cerd 
level and 5 barely at the 5 per cent level, though its correlations 
are consistently positive. 

The results for the six newer methods are set out in Table 3 
which shows, first, the reliability of the measurement and the 
attitude (Numbered as above) upon which it was tried; second, 
its correlations with the best four methods (1, 2, 3 and 4) above, 
and last, its mean correlation with all methods tried, usually six, 

Speed of decision, false belief and distraction are the only 
methods in which the pattern of correlations indicates some 
validity (at the 5 per cent level). Evidently the finding of 
Bruner (1) that misperception effects can arise from attitudes 
is one which shows up in differences of means but is not strong 
or constant enough to show up in the more exacting examina¬ 
tion by correlations and with methods of this kind. 
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Speed of reading seems unrelated to agreement with the views 
read and there is only a faint suggestion that fluency is related, 
though both show their highest correlation with the best 
method, namely Preference. These and the othernewer methods 
are being tried out again, each on ten attitudes, since the pecu¬ 
liarities of a single attitude, as in the present research, may 
give an unfair impression. 

Certain possibilities in both the more basic and the more 
exploratory methods remain to be examined, notably (a) the 
possibility that higher validities will be found in ipsative (4) 
than with normative scoring, (b) the possibility that some 
relations are curvilinear, (c) the possibility that there are con¬ 
trasting effects not only between stimuli that have to do with 
an attitude and those that do not, but also between those that 
favor and those that frustrate the attitude. 

It will be remembered that ipsative scoring expresses the 
score relative to some average or total of the given individual, 
whereas normative scoring expresses it relative to the distribu¬ 
tion in the group (4), Where the raw score expresses some real 
interaction of the individual with his environment—some be¬ 
havior that may be considered a real function of interest, as the 
tests of information, time and money expenditures, etc., do—• 
the present figures were scored normatively, 1 e., in standard 
scores , before correlating. Preferences, the PGR. and the Im¬ 
mediate Memory tests, however, were scored ipsatively, for in 
the last case, for example, the immediate score is clearly relative 
to the individual’s standards. His intelligence and memory may 
be such that he exceeds the score of another person on a particu¬ 
lar attitude even though his interest in that attitude is quite 
small. In the second case individual physiological differences in 
reactivity (one person may have an average P.G.R. deflection 
five times as large as another) need to be corrected. The first 
method, preferences, is automatically ipsative in scoring, since 
each person has the same total, 

This is no place to attempt a discussion of the ipsative- 
normative scoring problem, which, however, must be recognized 
as peculiarly insistent in the field of interest measurement 
and has, for that reason, been fully discussed in a first approach 
to the theory of interest testing (3). There is as yet no simple 
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solution and indeed a claim can be made for putting almost 
any interest measure on an ipsativc basis before putting it into 
normative scores. For example, the extent of the need ex¬ 
pressed in a money expenditure can only be properly gauged 
when we know how much money the individual possesses. 
However, in this dilemma we have thought it best to turn to 
ipsativc scoring only when the individual differences in mean 
scores are patently great and when there are good reasons for 
believing that some personal constant, e.g., physiological re¬ 
activity or general power of immediate memory, mediates 
strongly between the behavioral expression of interest and the 
particular manifestation we have chosen to test. 

No digression comparable to the above will be taken into 
curvilinear!ty. Suffice it that one investigator (15) has shown 
that speed of decision is related to strength of conviction in 
hi modal fashion, a quick decision being made where attitudes 
arc strongly for or against the question. A similar complexity 
has been found on the relationships of P.G.R. response and 
memory value (16) and P.G.R. response and speed of de¬ 
cision. (10) 

However, in our correlation plots we have encountered no 
persistent curvilmeamy, and with the exception of suggestions 
thereof in speed of decision, P.G.R, response, fluency and speed 
of reading, which require further investigation, we believe that 
there is no measurement problem in this respect. 

On the other hand, the problem of differences between the 
effects of statements favoring the successful expression of an 
attitude and those frustrating it is a very real one for certain 
methods, and in one method, the P.G.R., we had reason to 
believe that the poor validations obtained were due to the 
neutralization of two conflicting significant responses. Our 
search in this direction was stimulated also by the finding of 
Whately Smith (16) of a curvilinear relation between memory 
value of words and their P.G.R. deflection, such that the 
largest deflections were found both with words very well re¬ 
membered and words very poorly remembered. 

Consequently, in the ten items exposed both for the P.G.R. 
measures and for the Immediate Memory 'Test five were made 
“facilitating” and five “frustrating” items for P.G.R. and Im^ 
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mediate Memory were separately scored and correlated. Owing 
to the complexity of the inter-relations and the lack of signifi¬ 
cance of some of the results only the positive indications will be 
briefly set out, as follows: 

Attitudes evoking larger deflections on facilitating items tend 
to have also larger deflections on frustration items than do other 
attitudes (r = .29 and .36) and the same occurs to a lesser 
extent in immediate memory measures. 

Larger deflections on facilitating items in an attitude are as¬ 
sociated with poorer immediate memory for that attitude, par¬ 
ticularly in its frustrating items ( — .25 and —.36). The impli¬ 
cations of the last statement, together with the Whately Smith 
findings, are clearly that both immediate memory and the 
P.G R. have a more complex relation to interest than the 
simple linear one hoped for in this exploratory study. The bear¬ 
ings of this on further research are discussed below. 

Discussion 

Some observations not reducible to the above statistical 
digest need first to be added. These concern mainly the opera¬ 
tion of particular methods and can be presented seriatim. 

It was the general opinion of the experimenters that the 
reliabilities obtained for the expenditure of time and money 
methods were higher than the true dependability of the obser¬ 
vations warranted. Subjects, on close examination, were found 
to have been careless about their records of actual expenditures 
and to have made guesses, the similarity of which in the two 
weeks in question raised the apparent reliability. It is suggested 
that in further, more intensive experiments these records. be 
kept in more detail and over longer periods than one week. In 
two attitudes, notably that dealing with expenditure on the 
wife and among students with very restricted means, some 
experimenters noted a curious tendency to inverse relationship 
between the amount spent and the stated intensity (Preference 
score) of interest. This general problem of the tendency of 
conscious, verbal intensity to be related to the extent of the 
frustration of the need rather than to the basic amount of need 
satisfaction occurring in the given attitude justifies special 
investigation. 
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In the Immediate Memory Tat the impression of experi¬ 
menters while administering it was that it was not working very- 
well. The usual positional effects were noticed (first and last 
in each run of n being best remembered) but these were can¬ 
celled as far as possible by giving each attitude an equal 
positional chance. Since briefer items were apparently more 
frequently remembered it is suggested that future work attempt 
to bring all stimulus statements to five-, six- or seven-word 
length. There is some evidence, additional to that implicit in 
the above correlations, that good validity could be obtained for 
a memory method concentrating on failure to remember frus¬ 
trating; statements- In one attitude r’s of .40 with Preference 
and .14 with Time and Money were obtained for this "memory 
failure with contrary statements'* score. 

Both in the memory test and In the P.G.R. some distortions 
were produced by items which were unintentionally embarrass¬ 
ing or amusing, and subjects were suspected of not repeating 
the former even though they remembered them. Experimenters 
also suspected that dynamic effects, both in memory and 
P.G.R., tended to spread from a particular item to the items 
that happened to be neighbors. Some of the poorness of validity 
of the P.G.R. test was believed by most experimenters to arise 
from purely technical difficulties, e.g., change of meaning of 
the size of deflection with different absolute resistances, so that 
improved apparatus, such as the self-recording and more accu¬ 
rately balanccablc instrument since constructed, is expected to 
yield validities equivalent to the other methods. It is also sug¬ 
gested that one or two "buffer’’ items be introduced before each 
run of a dozen or so stimuli, since it was noted that the first 
items after an interval tended, regardless of significance, to 
produce appreciable deflections. 

However, the use of the P.G.R. and Immediate Memory 
Methods can never be satisfactory until the problem of the 
relative significance of responses to "facilitating-frustrating" 
stimuli, involving the above mentioned Whately-Smich effect, 
has been cleared up. The senior author believes that the current 
use of the P.G.R. could best be improved by using solely nocive 
(a specific variety of frustrating) stimuli and counting the 
response as a true function of the strength of the attitude 
threatened- 
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Improvement of the promising‘Distraction’ test is suggested 
through employing memorizing material more finely divisible 
and easier to remember than nonsense syllables. Numbers 
would be one such medium. 

In the equally promising Speed of Decision method it is 
possible that some useful compromise method might be worked 
out in which the extent of the subject’s stated agreement or 
disagreement would be taken into account as well as his de¬ 
cision time. This would bring the advantage that questions 
inviting negative answers could also be used and the experi¬ 
menter would not need to strain his ingenuity seeking questions 
that admit only of various degrees of positive answer. The rela¬ 
tion of decision time to degree of positiveness found in this 
method (for attitude No. 1) is shown in Table 4. 

Although the above relation might not represent a correla¬ 
tion of more than .10 or .20, the combination of a speed score 


TABLE 4 

Relation of Speed to Positiveness of Decision 




Response 



Probably 

Yes 

Definitely 

Times response given for 40 subjects 

. 315 

442 

443 

Average seconds per response. 

1.9 

a.a 

i -7 


with a degree-of-assent score should reach an appreciably higher 
validity. 

So much for special methods. In the experiment as a whole 
the chief weaknesses resided in: (1) the great demand on the 
subject’s time, which tended to produce fatigue and boredom 
inconsistent with good cooperation; (2) the multiplicity of ex¬ 
perimenters (seven different people in various aspects of the 
undertaking); (3) the defectiveness of individual test items, 
notably in the Information, Immediate Memory and Misper¬ 
ception tests, due to absence of item analyses. 

The first is unavoidable, except with expensively hired sub¬ 
jects, if many methods are to be cross-validated in a widely 
planned exploratory study, but need not interfere in the more 
restricted local studies that can now be carried out with the 
knowledge here presented as to the general field. The second 
may be a blessing in disguise: if a method is such that it yields 
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valid results in the hands of several experimenters one may be 
sure that it is a well-defined method and one valid in many 
circumstances. The third raises the general problem of whether 
item analyses should he carried out before or after the validity 
of a certain type of test has been established. The writers be¬ 
lieve that in exploratory studies the items should he designed 
on a sufficiently tricar general principle. If this proves to have 
any validity the less valid items can later lie combed out by 
item analysis (consistency with the test as a whole). 

The above considerations may indicate why the validity co¬ 
efficients of some methods have been called "acceptable and 
promising," even though the correlations, significant at only 
the 5 per cent level, are still short of what would normally be 
considered good validity. Our first aim was an exploration to 
discover new methods of any real validity. The second aim, of 
improving them tn practicable validity, can he predicted to 
encounter difficulties in certain cases. For when corrected for 
attenuation by low reliability the correlations with the criterion 
for most of the above methods still hover only between 0.3 and 
0.5, and we accept the position of Guilford that in psychometry 
validities below 0.5 are not of much practical use. 

However, the improvements indicated above are likely to 
raise the validity more than the reliability, and it is, moreover, 
possible that the present reliabilities, as indicated above, are 
overestimated for certain tests. Nevertheless, even if it be sup¬ 
posed that the validities of the separate methods could never 
be raised above 0.5, a very acceptable and effective battery 
could be made from a combination of half a dozen of these 
methods, For apparently only to a slight degree accounted for 
by error, what is the specific element in each? Most likely it is 
a combination of (a) other dynamic traits partly determining 
interest in the specific items chosen to represent the attitude, 
(b) individual abilities and temperamental qualities affecting 
the given medium of measurement, e.g., power of memory in 
the memory test, autonomic reactivity in the P.G.R., (c) life 
circumstances which cause certain expressions of the attitude to 
be unused or inhibited in certain persons. 

There is obviously much scope for research here, both on the 
sources of chance error in our measurements, he., on determin- 
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ing the physiological, instrumental and other causes of low re¬ 
liability of the dynamic measurement, which, for the moment, 
we have brushed aside as “chance error,” as well as on the more 
systematic specific factors discussed in the last paragraph, but 
our interest at this stage has been to pursue the element of real 
validity, leaving the causes of non-validity for later examina¬ 
tion—wherever some validity is found 
These last considerations raise a question to which both space 
and the roughness of data compel us to give only a tentative 
answer here. By taking validity as the mean correlation with 
the pool, we have implicitly assumed that only a single general 
factor is of importance, There is, however, some indication of a 
less clearly developed block of intercorrelating methods, addi¬ 
tional to the main block, including time and money expendi¬ 
tures, preference and information. It shows itself best in the 
correlations for one or two particular attitudes (6 and io) where 
a significant cluster appears in Immediate Memory (failure to 
remember statements contrary to the attitude), Preference and 
Projection (averaging .36 and ,17 in the respective attitudes) 
test and slightly in Information and P.G.R , but scarcely at all 
in time and money expenditures or memory for favorable, 
facilitating statements. This may be that special aspect of an 
attitude strength represented by unsatisfied drive, but until 
further studies confirm the pattern discussion would be pre¬ 
mature. 


Conclusions 

1. From the administration of tests of attitude strength 
(“interest in a defined course of action”) involving twelve 
different methods, applied to most of twelve different attitudes, 
the mean reliability of each method and the mean correlation 
of each method with the other methods was obtained. 

1. The reliabilities varied from moderate to good, but only 
eight of the methods had validities that were significant 

3. The validities were defined as the mean correlation with a 
pool of four or six “tried” methods, which were set aside at the 
beginning as psychologically sound criteria and of some tested 
worth. These were.—Expenditure of Money, Expenditure of 
Time, Stated Preference in Paired Comparisons, Information 
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Implementing a Course of Action, Immediate Memory and 
Psychogalvanic Response to statements concerning the atti¬ 
tude, Only the first four of these reached incontrovertible 
validities, 

4. The comparative failure of Immediate Memory and the 
P.G.R., despite good previous indications, seems traceable to 
complex relations, notably the Whately-Smith effect, differ¬ 
entiating the Memory response and the P.G.R. response (but 
in different ways) respectively to facilitating and frustrating 
verbal stimuli. 

5, In several methods where the interest response is mediated 
by the extent of the individual’s possession of some secondary 
personality factor large differences appear between the average 
magnitudes of the individuals* mean responses to all attitude 
interests and it is then necessary to rescale the score ipsatively 
before correlating. 

6. Among the more *• tentative*’ methods, which were cor¬ 
related with the core of‘'tried” methods on one attitude each 
(but not with each other), the reliabilities were of the same 
satisfactory order. Promising validities were found for the 
methods of Distraction, False Belief and Speed of Decision, 
suggestions of validity were found for Fluency, while Misper¬ 
ception (Illusion) and Speed of Reading had no validity. 

7, All of the tests were very short (10 items each), the 
purpose of the investigation being only to pick out, among an 
array of new psychological approaches, those possessed of any 
validity at all. Lengthening of the tests would raise the val¬ 
idity of six of them to about 0.5, of four others to .3 or .4. 
Item analysis might raise it somewhat more, but the over-all 
results seem to indicate that some real specifics are neces¬ 
sarily being measured by the specific methods and that a satis¬ 
factory objective measure of an attitude will only be obtained 
by a battery employing four to six different methods. 

8. Various results, notably the existence of a duster among 
some methods on the fringe of the main cluster, give slight 
indications that there is some functional separation of that 
part of the strength of an attitude which arises from its 
frustration. 

9, From the experience of the four experimenters in the de- 
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sign and conduct of the experiment some suggestions for im¬ 
provement when carrying the research further are offered. 
Together with the methods explored in an extension of this 
research (8) the present methods constitute a set of eight new 
methods (Information, Immediate Memory, Preference, Speed 
of Decision, False Belief, Psychogalvanic Response, Projection 
and Distraction), additional to the criterion methods of Money 
and Time Expenditure and the classical Opinionaire (which 
they equal in validity), available for further use. Two directions 
of research now open up: (a) the improvement of the above 
valid methods by concentration on each technique singly, in 
relation to a standard validating core, (b) the exploration of the 
nine untried methods (Nos, 5, 7, 9, 12, 15, 17, 20, 23 and 24 in 
the primary list above) described in this same theoretical 
scheme. 

Since the successful contribution of psychology to the much 
needed integrating studies in the social sciences, with economics, 
anthropology and sociology, depends to a large extent on the 
psychologists’ ability to supply objective and accurate means of 
measuring strength of motive, interest or attitude, i.e., of dy¬ 
namic traits generally, it is to be hoped that the present 
exploration will be a foundation and stimulus for vigorous re¬ 
search in this area. 

The writers wish to express their gratitude to the Graduate 
Research Board of the University of Illinois and to the Social 
Science Research Council for funds contributing to the com¬ 
pletion of this research. 
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THE CONSTRUCTION AND VALIDATION OF A 
WORK-TYPE AUDITORY COMPREHENSION 
READING TEST 

GEORGE SPACHE 
Chappaqua, New York 

We believe that there is a need for a test to determine the 
potential ability of students to comprehend and use high- 
school and college-level reading materials. This test should be 
relatively free from the influence of intelligence, as commonly 
measured, and independent of the influence of any reading 
difficulties of the individual. It should serve to indicate the 
possible performance level in silent comprehension and auding 
abilities. In our opinion, such a test would replace the use of 
common intelligence tests in estimating potential reading abil¬ 
ity. 

Such a test would be preferable to the use of an intelligence 
test because the latter is not necessarily a good indicator of po¬ 
tential reading performance. Intelligence is itself a potential 
which is not achieved to equal degrees in all areas of communi¬ 
cation There is no good reason why an intelligence test should 
be very closely related to reading ability or more significantly 
related to comprehension than to writing or speaking skills. 
We see no reason why one measure of potential general ability 
should be the best estimate of probable performance in many 
specific skills. 

A second reason against the use of intelligence tests to pre¬ 
dict reading comprehension is the extent of common content in 
such tests Many intelligence tests actually function as reading 
tests and their results are merely a measure of reading status 
rather than an estimate of future or possible performance. 

Finally, intelligence tests do not function as accurate meas¬ 
ures of potential reading skill because reading performance is 
not dependent solely upon intelligence. Such factors as ex¬ 
posure to reading materials, socio-economic status, attitudes 
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toward reading. etc., definitely influence reading performance 
These are sufficient rn explain many of the observed diem.™,, 
cies between intelligence ana reading test results. 

For these reasons, we have attempted to devise a pair of 
comparable test's that would determine present reading com¬ 
prehension status and the potential ability of the student to 
improve his silent comprehension. The tests were arranged to 
parallel each other by selecting comparable passages from com¬ 
mon high school and college texts in science, literature and 
social science, Two forms of each test comparable in length, 
difficulty and types of reading passages were constructed, The 
Silent Cam prehension Test requires the pupil to read the pas¬ 
sages and to answer questions in the usual manner. In the 
Auditory Comprehension Test, passages and questions are read 
to the student. Tints, wc obtain comparable measures of per¬ 
formance and potentiality. 

Possible uses of these tests are numerous.' The present status 
of an individual in ordinary silent comprehension can readily 
be determined. With this knowledge it is possible to detect the 
extent of comprehension difficulties. The use of the auditory 
type of test would indicate whether ordinary remedial pro¬ 
cedures, or specific training in auding skills (as auditory vocabu¬ 
lary, organizing and summarizing, taking notes, etc.) were 
necessary or likely to be profitable. To he specific, low scores in 
silent comprehension in the presence of average or better audi¬ 
tory comprehension would indicate that common remedial tech¬ 
niques would probably be profitable, Low scores in both tests 
would indicate a degree of low potential for high-school or 
college work not likely to be improved except by extensive and 
prolonged remedial help. Average or better scores in silent 
comprehension with low auditory comprehension, would indi¬ 
cate the need for special training in auding or auditory skills, 

The results in terms of total score on the first edition of the 
Auditory Comprehension Test were correlated with other sec¬ 
tions of the Diapiostic Reading Test battery as well as measures 
of intelligence and reading. 

1 TK«‘te tests may now be obtained from Dr, Frances 0 , Triggs, 419 West 09th 
Street, New York 17, N. Y, They are published by the Committee on Diagnostic 
Reeding Tests, n non-profit corporation devoted to the study and improvement 01 
reading procedure!. 
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In view of a reliability coefficient (S-B) of .788 for this edi¬ 
tion, these correlations would seem to support our hope that 
this test would be a measure of factors operating in reading 
comprehension. The relationships with the measures of silent 
comprehension are of the order of 5; those with vocabulary and 

TABLE 1 


Relations of Scores on Auditory Comprehension Test to Variotts Other Measures 


Auditory Comprehension—Terman McNemar IQ. . 

358 

—Cleveland Reading—Voeab. 

358 

Comp . 

... 512 

—Diagnostic Reading—Voeab 

... 675 

Gen Read. Rate. 

. 167 

“ “ Comp. 

• 493 

Social Studies Rate 

. . .299 

“ “ Comp. 

582 

Word Attack, 

400 

Oral Reading (errors) 

.. .177 


TABLE 2 

Intercorrelation Matrix and Reliabilities ( K-R 21') of Total and Part Scores on the 
Diagnostic Reading Tests, Section II, Comprehension Part 2, Auditory Form A 

(N = 162) 


Coefficients of Correlation 




2 

3 

4 

5 

6 







Social 

Sciences 



Main 


Conclu- 

Physical 

and 


K-R 

Ideas 

Details 

stons 

Science 

Literature 


21 

(47 Items) 

(63 Items) 

(25 Items) 

(47 Items) 

(88 Items) 

K-R 21 .... 


72 

■38 

.48 

•43 

.66 

1. Total Score 







No. ofltems 135 .. 

■73 

.86 

.58 

•72 

OO 

.92 

2. Main Ideas 







No of Items 47. 

.72 


■23 

.62 

.64 

.8a 

1 Details 






No ofltems 63. 

■ .38 



.15 

.56 

•52 

4. Conclusions 







No ofltems 25 . 

.48 




■ 54 

.71 

5. Physical Science 






■56 

No ofltems 47 

■43 






intelligence of the order of 3, with the exception of that with the 
Diagnostic Vocabulary Test] while those with rate, word analy¬ 
sis and intelligence range from 4 downward. 

Our finding that there is a significant relationship between 
silent comprehension and the comprehension of material read 
to a student is similar to the results obtained by Swanson and, 
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Anderson. 1 These authors also found that results in the two 
situations tended to l»c markedly similar. 

A second edition attempted to differentiate questions into 
subgroups determining the comprehension of main ideas, de¬ 
tails and conclusions. Questions on social science and literature 
were akn distinguished from those in physical science in the 
hope that comprehension in these types of questions and sub¬ 
ject matter could ivc measured. Unfortunately, the intercorrela- 
tion matrix of part scores did not support this attempt. 

Reliabilities of the subscores range from .38 to .7 1 and the 
intcreorrelations of sub-sections from .15 to .8a. With the possi¬ 
ble exception of Main Ideas, none of the subscores is sufficiently 
reliable to justify its distinction. Item validities ranged in 
median values from at.i for Details, to 40.fi for Main Ideas and 
ly.t and 28.5 for Forms A and B, respectively. Median-item 
validities for Sendai Science and Literature and for Physical 
Science questions were afi.J and 22.7. With this evidence, no 
attempt was marie to differentiate types of questions or sub¬ 
ject matter in the final revision. 

Before undertaking the third and final revision, we thought 
it desirable to investigate the influence of chance and informa¬ 
tional background upon scores in the Auditory Comprehension 
'test. We have often felt that many questions in other reading 
tests could be answered by a student without ever reading the 
test material. In fact, we confirmed this impression in a study 
of another test in the Diagnostic Test Battery. In a measure of 
silent reading comprehension, we believe that this situation 
would Ire highly undesirable, since it would vitiate the attempt 
to measure comprehension in a specific body of reading mate¬ 
rials. Since the purpose of the Auditory Comprehension Test is 
to measure potential, and not performance, in reading, the 
fact that the student may be able to answer a number of ques¬ 
tions even though he has not read the test material does not 
invalidate the test. If we are to measure potential, then the in¬ 
fluence of reading backgrounds and information should be al- 


»Swanson, D, E„ «nd Anderson. I. H., "A Comparison of Comprehension Stow 
Obtained from Silent Reading, Oral Reading and Auditory Comprehension. P 
lished research u quoted by D. E. Swanson, in "Common Elements in Silcn 
Oral Reading,*’ Psytiwlogicm Monographs, XLV 1 II, (1937) 36-60, 
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lowed to operate to a reasonable degree since they are contribu¬ 
tors to this potential. 

A group of 33 high-school pupils whose socio-economic status 
and intelligence were relatively high were able to answer a 
median of 58 per cent of the questions correctly We do not 
know whether this figure should be greater or less, since we 
know of no comparable data. It would imply that about half 
of the questions of the Auditory Comprehension Test can be 
answered on the bases of intelligence, reading background and 
the other factors that influence reading skills The remainder 
of the questions are, presumably, dependent upon the ability 
to comprehend specific high-school and college textual mate¬ 
rials Thus the test may be measuring potential both by sam¬ 
pling the capacity for understanding a group of selections from 
common texts and by measuring the facility in using reading 
or informational experiences 

The third and final editions of the Auditory and the Silent 
Comprehension tests were based on these experiences with the 
two preliminary editions. The parallel nature was preserved 
and the similarity between the tests increased by making them 
of the same length The final editions are composed of approxi¬ 
mately 50 items and require about one class period for admin¬ 
istration We believe that judicious use of the tests will make 
possible comparisons between present status and potential per¬ 
formance in reading and auding, as well as a prognosis of the 
probable outcome of remedial help or training in auding skills. 
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G1 NTRM, MHUWU AI, A FIT TIDES TEST FOP 
THE SELECTION* OF CIVILIAN EMPLOYEES IN 
WAR DEPAKTMi XT INSTALLATIONS' 

ADAM PORHHI.N, JK. 

M'-ui'^iiSuan Life Iit-.uraiur Company 


Drmvt, World War II, rhe Civilian Subsection of the Per¬ 
sonnel Ucse.m h Section. The Adjutant General’s Office, was 
engaged in the eoimrut tion, standardi/alion, and validation 
of vannust aptitude rests for the selection and placement of 
civilian jtcrvinncl in various War Department installations. 
The Gcnmt? Mcehatticai Aptitudes Jest was one of these tests. 
It was derived from four rests th.tr already had shown some 
validity for the selection of employees for mechanical jobs. 
The study here reported was carried out in 1945. The writer, 
who was on the start' of the C ivilian Subsection, was assigned 
this part icular subject because he had been a teacher of Related 
Mathematics and Sciences for several years in the Saunders 
Trades School, Yonkers, N. Y., where this test was tried out, 
and he w as. therefore, in a better (Htsition to evaluate the re¬ 
liability and validity of the criterion data than someone who 
was not acquainted with the school. 


Purpose of Study 

The immediate objective of this study was to determine the 
validity of the Genera? Mechanical Aptitudes Pest for the pre- 

1 This study was carried mil while the writer wat on the staff of the Personnel Re 
search Section of The Adjutant Cirncral’t t (liter. The opinions expressed in this article 
are the author's and do not necessarily reflect the official attitude of the Department 
of the Army. 

This article reports only part of this study. The validation study was carried put 
on six groups of students. This article reiwix the results obtained on the nth-year 
Technic*! major group The real of the study appear* in the journal oj Fsytbww, 
XXIX (tojoj, ijj-i jj. _ — . 

The writer makes grateful acknowledgement to Dr. E. E. Cure ton and l)r, Erwin 
K. Taylor for their encouragement in carrying out this study; also to. Dr, Lawrence 
Ashley, Mr. William Carey, and Mr. Pa irk-k McHugh, whojicrmictcd this study to 
be carried out in the Saunders Trades School, Yonkers, N. x, 

3*4 
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diction of success in industrial and technical high schools. 
The ultimate aim was to determine its validity for the selection 
of civilian employees for various mechanical jobs in War De¬ 
partment Installations. Since most of the graduates of the 
Saunders Trades School go into their respective trades and 
specialties upon graduation and are fairly successful in their 
work, it was hoped that the validation of the General Mechan¬ 
ical Aptitudes Test for the prediction of success in this school 
would also give some indication of its validity for the selection 
of employees for various mechanical jobs which are similar to 
those for which the students were being trained in this par¬ 
ticular school. 


The School 

The Saunders Trades School is an industrial and technical 
senior high school for boys, supported mostly by Federal, 
State, and local funds, but partly by private funds derived 
from the so-called Saunders Fund This school serves the entire 
city of Yonkers, N. Y ; its normal enrollment is over 1,000 
students. At the time of this study, however, the enrollment 
was considerably under this figure because of war conditions. 

The Saunders Trades School offers two majors, industrial 
and technical. The industrial major has a duration of three 
years, the students being admitted after they have completed 
the ninth grade in one of the several junior high schools in 
Yonkers This major is more or less terminal in nature in that 
most of the boys are expected to go to work in their respective 
trades upon graduation. The technical major also has a dura¬ 
tion of three years. Its graduates are expected either to become 
junior engineers in industry or to go to engineering colleges for 
further study. The series of courses in this major are so ar¬ 
ranged that the students can meet college entrance require¬ 
ments upon graduation. 

The industrial major consists of seven curricula: Auto Me¬ 
chanics, Building Maintenance, Carpentry, Electric Installa¬ 
tion, Machine Shop, Plumbing, and Refrigeration. A student 
entering the industrial major, selects one of these curricula. 
All seven curricula are parallel in nature, but in each one the 
student pursues course and shop work along his particular 
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ipenaltv Til- -*m!enf i 3 v\.»fcs about half of his time to the 
'*hop and hi?" ■*.»"-;% Hw.f> and du; other half to related 
mujwrs in in.ithcm.uii v, irm-c and drafting. The work in 
these nmr?--, m f-mlv well mtcys.ued with the work in the 
that is. ?}»*• thc-ey ami the mathematics involved in a 
parntul.tr -imp uu-mn tj.-n unit ,uc first discussed in the re¬ 
lated foirvrs l»rt**Jc the shop work is begun. For example, the 
student 3-. taught the theory and mathematics of parallel cir¬ 
cuits in eh-ittnity before he performs the experiment in that 
project m the shop The degree ,,f integration between the 
related coutm’s and the shop projects varies among the seven 
curricula, di pcmlm:: ! iredy on the cooperation of the instruc¬ 
tors 

The tcvhnu.il major of five curricula: Architecture, 

Industnal Chemistry, Fit vrncity, Mat hint* Design, and Power 
(feneration Fa, h student pursues one of these for three years. 
As in ihr industrial major, the five curricula arc parallel in 
nature. bu», .it th** s.ur.r time. specialized. Kach curriculum 
consists «»l shop «,r l.iboi at»ii v work, related courses, and cer¬ 
tain .n.nh-mu imrw, mo?! Fuylish and American History, 
The related onuses ate integrated with the shop and labora¬ 
tory work. 

fhf Papulation 

This study was tarried .,«u on one of the three groups of 
students in the technical major, namely, the IIth-year group. 
There were seventy two students in this group distributed 
among the five curricula as follows: 


Curricula 

Number of 
Students 

Architectural Course . , , 

.10 

Industrial Chrm. Course 

. 8 

F.lectrical Course 

.18 

Machine Drsh:n Course 

_ 11 

Power (feneration Course 

. 14 


Total. , . .. 11 

Description of the “test 

The General Mechanical ./pfifades T csl was designed to meas¬ 
ure various aspects of mechanical aptitude. It consists of fout 
subtests as follows: T 
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1. Mechanical Comprehension.—This consists of 43 three 
alternative multiple-choice items administered with a 15-min¬ 
ute time limit. The items of this test were adapted, by permis¬ 
sion of the author, from Forms AA, BB, and WI of the Bennett 
tfest of Mechanical Comprehension. It measures general me¬ 
chanical insight and the capacity of an individual to under¬ 
stand mechanical operations. 

2. Technical Reading—This is a paragraph-and-question 
test based on selections from technical manuals and texts. 
It consists of 2g items administered with a 15-minute time 
limit. The directions and a sample question are shown below. 

DIRECTIONS 

This test consists of five paragraphs and some questions 
about each paragraph. There are 29 questions in all. Read 
each paragraph and then answer the questions which follow. 
Read the paragraph as many times as you need to in order 
to answer the questions The first paragraph and the ques¬ 
tions based on it is a sample to show you what to do. 

The blast furnace is a great stone chimney 100 feet high 
or more. It is filled with a roaring fire fiom top to bottom. 

Into the top of the blast furnace are dumped caiefully 
measured amounts of iron ore, coke, and limestone. After 
4 or 5 hours of teinfic heat, molten iron is drained off 
from a door at the bottom of the furnace. 

46. A blast furnace is made of 

A. iron 

B. stone 

C. clay 

D. coke 

3. Paper Form Board.—This consists of 44 items, adminis¬ 
tered with a 10-minute time limit, and measures the ability 
to manipulate spatial images mentally. The directions for the 
test and a sample question follow. 

DIRECTIONS 

This test consists of 45 problems. At the top of each page, 
there are four large figures labeled A, B, C, and D, like those 
shown in the first row below. Each problem shows one of 
these figures cut into pieces and scattered around in a box. 
Look at the pieces in each box, and decide which one of the 
figures could be made if all the pieces in that box were fitted 
together. Some of the pieces may need to be turned around 
or turned over to make them fit. The pieces in each problem 
will make only one of the figures. The first problem is a sample. 
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4, Shop Arithmetic. This test consists of 20 free-answer 
arithmetic reasoning problems based on shop arithmetic. Six¬ 
teen of the items contain diagrams, tables, or drawings, and 
the test is administered with a 20-minute time limit. A sample 
problem is shown below; 



When the larger pulley wheel makes too turns per minute, 
how many turns per minute does the smaller wheel make? 

Procedure 

Before the General Mechanical Aptitudes Pest was given» 
permission was obtained from the Yonkers school authorities 
to transcribe the school grades for use as criteria. These grades 
were copied from the school’s progress sheets which list the 
grades according to the curriculum and the year, In May, 
1945, ^ le Wat Wil! » administered to 4H0 students of the Saunders 
Trades School by the teachers after a training session had been 
given by a staff member of the Personnel Research Section. 
All tests were scored at the headquarters of the Personnel 
Research Section. 
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Analysis and Results 

In order to see whether the seventy-two students used in 
this study constituted a fairly homogeneous group, the analy¬ 
sis of variance technique was used to investigate the question 
whether the four subtests of the General Mechanical Aptitudes 
\Test differentiated significantly among the five curricula of 
the technical major. The results, which are not reported here, 
showed no significant F-ratios at the I per cent level among the 
curricula within the technical major. Therefore, the nth-year 
students from the five technical major curricula were com¬ 
bined into one group and the analysis carried out on this 
group. 

/. 'The Criterion .—In a technical high school, such as the 
Saunders Trades School, each technical subject has a definite 
place in the total pattern of instruction offered; that is, the 
class-room subjects such as mathematics, science, and theory, 
are definitely related to the shop or laboratory work. These 
class-room subjects provide the student with the basic knowl¬ 
edge and fundamental skill which will enable him to pursue 
his shop studies more intelligently For example, in the Applied 
Mathematics course the students in the Electrical curriculum 
learn the basic mathematics connected with the series and 
parallel circuits; in the Basic Theory course they study the 
fundamentals of the series and the parallel circuits. With this 
background, the student can learn the shop work more easily 
and more intelligently. Moreover, the work in the classroom 
is fairly well integrated with the work in the shop; that is, the 
theory and the mathematics involved in a particular shop 
instruction unit are first discussed in the related courses before 
the shop work is begun. 

Because of this high integration of related subjects with the 
shop work, and because the technical subjects do represent 
the core of the curriculum of the Saunders Trades School, 
it occurred to the writer that a composite of the grades in 
these technical subjects would constitute a more valid criterion 
than a composite of all of the grades, including the more aca¬ 
demic subjects such as English, Economics, History, etc. Be¬ 
cause of these considerations, the composite of the grades re¬ 
ceived by the students during their loth and nth years in 
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the technical subjects was taken as the criterion in the study. 
This composite was obtained by the summation of 12 grades 
5 n five different subjects, namely, Basic Theory, Shop, Physics, 
Applied Mathematics, and Plane Geometry, 

Pstsmate of the reliability of the criterion was obtained by 
correlating the sum of scores for the first terms with the sum 
of scores for flic second terms. This reliability coefficient was 
found to be .K8. When this was stepped up by the Spearman- 
Brown formula, it became .94. 

TABLE 1 

lntrtr/trr!Mi<i9i Jmnnf Ttlft a«J the Criterion 



(S - «* 

Cniut* 

Twit. 

Rr»<itR< 

Paper 

Bosaid 

Shop 

toilh. 

Criterion 

Mean 

so , . 

Mech' Comp. 

Tech, ftodima; 

Taper frnnti 

Sh*vj» Aruhmcik . 

1 9-Si r " 
7,704 

17.6A7 

j.t*4 

5 i« 

33-'97* 
5-4H 
■ +■?» 
.117* 

•3- 6 S3 
a.itoo 
■394 
• S&3 
.o4i* 

919.940 

83.650 

■493 

.541 

.417 

■454 


* Not Mfmfieani at tfw one per sent level 

IK Reliabilities of the 'Tests,- - No estimates of reliabilities of 
the four tests were made in this study. Such estimates, how¬ 
ever, were made previously by the Personnel Research Section, 
and were found to be quite satisfactory, 
j, Inlercorrdatmis. - The intercorrelations among the tests 
and the criterion are shown in Table 1. All of the correlations, 
except two, were found to be significantly different from zero 
at the one per cent level. 

ef.. Multiple Correlations .—The multiple correlation was com¬ 
puted by the usual Doolittle method and also by the Wherry- 
Doolitrie method in order to show the amount of shrinkage in 
the R. By the Doolittle method, the multiple R for the entire 
battery wag found to be ,644 and .624 by the Wherry-DooMe 
method. Thus, the shrinkage in the multiple R was fairly 
small, namely, ,02. 

When the shrunken multiple R was corrected for the attenu¬ 
ation in the criterion, it became ,643 or R 1 ™ • 4 I 3 - Thus the 
battery accounts for about 41 per cent of the variance of the 
criterion. 
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In order to show the contribution of each test toward the 
efficiency of the battery, Table 2. is presented. 

TABLE 1 


Contributions of the Tests Toward Battery Efficiency 




R 

Corrected* 

%of 

Criterion 

Teat Battery 

R< 

R 

Variance 

B 

.1938 

542 

■SS9 

31.2 

B, c 

B, C, D 

• 3790 

.616 

635 

4 ° 3 

.3870 

.(>11 

.641 

41.1 

B, C, D, A 

3891 

.6124 

643 

41.3 


A = Mechanical Comprehension B = Technical Reading 
C = Paper Form Board D = Shop Arithmetic 

* Corrected for the attenuation in the criterion, the reliability coefficient being .94. 


5. Beta Weights .—The beta weights were also found by the 
Doolittle as well as the Wherry-Doolittle methods Their values 
are shown below: 


Test 

Beta freights 

Technical Reading . 

. 318 

Paper Form Board 

240 

Shop Arithmetic. 

■ ■ 153 

Mech. Comprehension. 

.137 


6 . Regression Equation —The regression equation for pre¬ 
dicting the criterion from the four subtests of the General 
Mechanical Aptitudes Test may be written in standard form as 
follows: 

z 0 = .31 8 z B + 24° z o + .153 z d + -137 z a 
I n order to get the equation in score form, the / 3 ’s were trans¬ 
formed into the corresponding b’s. The resultant equation, 
expressed in terms of deviations from the mean, is as follows: 

x„ = 4 844x33 + 3.667 x D + 4.571 x D + 1 487 x A 
with a standard error of estimate of 64.08. 

Conclusions 

1. In general, the General Mechanical Aptitudes 'Test shows 
fairly high validity for the prediction of academic success in 
the basic technical courses in an industrial or technical high 
school. The multiple correlation, when corrected for attenua¬ 
tion, was found to be .643. Thus, the General Mechanical Ap¬ 
titudes Test battery accounts for about 41 per cent of the 
variance of the criterion. 
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When the Mechanics! Comprehension Subtest is removed 
from the battery, the multiple R, corrected for attenuation 
k >4? and the resultant three-test battery still accounts for 
about 4! per cent of she variance of the criterion. Thus, fora 
forty five rmmitr testing time, one can pet a fairly good indi¬ 
cation of the probable success nf a st udent in a technical high 
kIiooI such as Saunders, 

In the larger study, thr efficiency nf the battery for the pre¬ 
dict son of success m specific subjects such as mathematics, 
science, simp, and theory, was studied. None of the multiple 
K's in that study exceeded the multiple R found here. Only 
one of them, the one predicting success in Physics, equaled the 
multiple R found in this study. 

1. From the Hera Weights one can conclude that the Aech- 
niad Refilling Vat contributes most heavily to the prediction 
efficiency of the battery. There is so much technical reading 
required in the science, mathematics, theory, and shop courses 
of the technual high school that skill and speed in doing such 
reading contributes heavily towards academic success in such a 
srhont. 

3. Saunders Trades School is generally similar in students, 
curriculum, instructors, support, etc., to other industrial and 
technical high schools in New York State. Therefore, one can 
conclude that the (lateral Mechanical Aptitudes Test is a valid 
test for the prediction of success in an industrial and technical 
high school in the state of New York. Since many states follow 
the N. Y. State pattern of vocational education, one could 
probably conclude that this test has similar validity for any 
such industrial or technical high school, 

4. Finally, since most of the graduates of the Saunders 
Trades School go into their respective trades and specialties 
upon graduation and are fairly successful in their work, the 
writer would conclude that this test should be fairly valid for 
the selection of employees for the various mechanical jobs for 
which training is given in this school. In other words, from this 
study one can infer char the General Mechanical Aptitudes 
( fesl should be valid for the selection of machinists, machine 
designers, electricians, power plant operators and technicians, 
junior industrial chemists, and junior architects. 



THREE AIDS IN THE EVALUATION OF THE 
SIGNIFICANCE OF THE DIFFERENCE 
BETWEEN PERCENTAGES 

C H LAWSHE and P. C. BAKER 
Purdue University 

Those who construct and use paper-and-pencil tests are 
confronted with the task of making a lengthy item analysis. 
Many authors have offered various devices to aid in this work. 
All of these serve the purpose for which they were intended; 
however, there are within and among them several weaknesses, 
viz: 

1. They may not be truly time-saving. A considerable 
amount of additional computation may be required. 

2. They may give only gross approximations to the desired 
values. 

3 They may give results in terms of a statistic whose 
sampling error distribution is unwieldy. 

We here offer three instruments to be used in the evaluation 
of the significance of the difference between two percentages. 
Table i, “The Significance of the Difference Between Per¬ 
centages”; Table 2, “The Omega Equivalent to a Percentage”; 
Figure I, a nomograph to estimate the significance of the dif¬ 
ference between percentages. 

These instruments were devised with certain criteria in mind: 

1. The amount of calculation required shall be minimal 

2. Restrictions placed upon the data shall be minimal 

3. The results obtained shall be accurate to a degree commen¬ 
surate with published results 

4. The results obtained shall be subject to a “standardized” 
interpretation. There shall be no ambiguity. 

5. The instruments shall be compact; easy to use. 

Table /.—Table I is the result of a direct approach to the 
usual formula for the critical ratio of the difference between 
two percentages to the standard error of that difference when 

463 
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SIGNIFICANCE OF DIFFERENCE IN PERCENTAGES 


the size of the samples upon which the two percentages are 
based are equal. (It is applicable only when Ni = N 2 .) 


t = 


Pi ~ Pi 


1 / 


piqi 

N 


+ 


p 2 q 2 

N 


-■ - Q ___ _ _ , 

VN V pi qi + p 5 q 2 

Let the right-hand member of equation 1 equal Theta. Theta 
could be evaluated for all combinations of p r and p 2 , but this 


Pi ~ P2 


(I) 


(a) 


TABLE a 

The Omega Equivalent to a Percentage 
(Omega positive for p > .50, negative for p X .50) 


p 


P 

P 

Q 

P 

1,00 

1 1106 

OO 

•75 

3702 

.25 

.99 

.9690 

.01 

•74 

3541 

. 26 

98 

.9099 

.02 

.73 

.3379 

.27 

97 

.8648 

■03 

■72 

3221 

.28 

■ 9 6 

.8159 

.04 

■71 

3064 

.29 

• 95 

.7917 

■05 

70 

.2910 

■3° 

94 

.760a 

.06 

.69 

2756 

■3i 

93 

.7320 

.07 

.68 

.2612 

■ 32 

.91 

.7051 

08 

.67 

• 2453 

■33 

91 

.6797 

09 

.66 

2303 

■34 

90 

.656 s 

.IO 

.65 

• 2155 

■35 

.89 

.6326 

.11 

.64 

.2006 

36 

88 

.6104 

.12 

• 63 

1B61 

• 37 

.87 

5891 

■ 13 

.62 

.1714 

.38 

86 

.5697 

■ H 

.61 

1568 

39 

•86 

•5472 

•15 

. 60 

.1424 

.40 

.84 

•5287 

16 

59 

1280 

.41 

.83 

So 96 

•17 

■58 

■ 1137 

.42 

. 82 

.4911 

18 

57 

.0994 

•43 

.81 

•4728 

19 

.56 

0851 

.44 

,80 

• 455° 

.20 

•55 

.0708 

-45 

79 ' 

• 4375 

.21 

•54 

.0567 

.46 

.78 

42.0a 

.22 

•53 

.0424 

•47 

-77 

■4033 

23 

•52 

.0283 

.48 

76 

3867 

• 24 

5i 

.OI4I 

.49 

•75 

.3702 

25 

.50 

.OOOO 

50 


would result in a table much too large for practical use; hence, 
we have limited ourselves in Table i to combinations of pi 
and p 2 which are multiples of five. 

To use Table i it is necessary only to multiply the tabled 
value of Theta by the square root of N to find the critical 
ratio. 

t = 0 VN ( 3 ) 

Table I is useful in classifying a large number of differences 
into three categories; (i) definitely significant, (a) doubtful, 



(j) definitely not significant. Those differences falling in the 
doubtful category may then Hr more carefully evaluated by 

means of equation t. 

Talk J, further innsdcratiun of rhe inaccuracies inherent 
in Tahir t due to flic skewness of the sampling distribution of 
p when the true value of p approaches ioo f T or oo% led to the 
development of a statistic vdmh is a function of p and which 
has a constant standard error dependent only on the size of 
tfsc sample Headers familiar with Fisher's aretanh transforma¬ 
tion of r will recognise the utility of such a statistic. Kelly 
Ip, pp t t*AJ 594 * 'dfers the development of such a statistic; 
our development differs only in minor details. 


i! u* ffp) 
dtt f'lpjdp 
ffo «* Iffpjftf’ 

i 1 


or) 


(Omega is a function of p) 
{First derivative) 

{Take variance of both sides) 

(I-ct variance error be inversely 
proportional to N) 


V p f 


J rip) 


(Integrate) 


f(p) ^ a a re &in %/p + C 
0 a arain Vp 1 - C (The desired function) 

To test the significance nf the difference between two per¬ 
centages we need only to transform them to Omegas and apply 
the critical ratio formula: 

difference 

t $, K, of difference 
(ijtrcdn Vpi + C) — (i arcsin Vpi + C) 

V' a, + si 


m. .ajprtiwe Xrrr.i — jr~< 

,/ aN,N t 
r Nj 4 * H 


/— / . /— 51 * \ 

VI ^arcsin %/pi - -y 


( v y 

— yi ( arcsin Vpr — -J 
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The reason for the algebraic manipulation, factoring out the 
square-root of two in the numerator, will be clear from the 



Fig. I. 


following. The constant of integration is evaluated as -\A^ 

in order to make the function symmetrical about 50%. 

If Ni equals N 2 formula 4 reduces to 

t = -n/N (Oi — Qj) 
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Omega is now defined as 

It ™ y/i ^arosin \F p — 

Tabic z contains values of Omega for all values of p. These 
values Ate jvkUivc for p greater than 5o% and negative for p 
le<s than %,'fr- 

Fur simplicity in notation we here define Omega lower case 
as the difference between two Omegas 

«? ,n ttj — U* 

TABU-, 3 


nt t e \ 

r Cc njfidtrti( f Tjf and Omega 

■*> 

tV "«i?S T*Vf \ 

<S * w* * 3va<7 

If*- *114 Tab's 2 and <vkb Figure l 
(V» Sic equal) (N'» are unequal) 

? ? *| 

s c*+* 

m # t1 

v N 

a QfS a.O S 8 

V« / aN,N, 



'V N. + Ns 

A 

S V-y- 

VH " 

I </vx> 1.9600 

5*V “ “i* ‘ “~r~ -- ** nw 

VN / iN.N, 

y n,+n, 

mC 

t f> M t) 

V /H‘ * 

« 6+49 1.6+49 

—>?, ** «i* ■ f^sass — ■= uu 

vN / 2 n,N, 



V N, + N, 


To use Table 2 find the Omega equivalents of the two per¬ 
centages, find the algebraic difference: between the two Omegas, 

multiply this difference by a /if the two samples 

' Nt + Na 

differ in size, or multiply the difference by -s/N if the samples 
are of equal size. This yields a critical ratio which can be 
evaluated in terms of the normal probability function, or Stu¬ 
dent’s distribution of t if N H- Ni is less than 30, in which case 
the degree of freedom associated with t is equal to Nj + Ni - 2. 


c « « 


4/? S 


Ni d- Ns 


0 




Figure /,—Figure I is a graphic representation of Table a, 
To use this nomograph, find pi and p», and join these points 
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by a straightedge. Where the straightedge crosses the center 
scale find omega (w). This value is identical with that found 
from Tables, i e., w = fli — and is used in the same manner. 
This nomograph 1 is of use when a large number of differences 
are to be evaluated and classified as significant, doubtful, and 
not significant. 

A Shortcut —When a great many differences are to be evalu¬ 
ated, as in an item-analysis study, the following shortcut is 
suggested. Instead of multiplying each Theta or Omega value 
by the square-root of N, find the Theta value or Omega value 
corresponding to critical ratio values significant at various 
desired levels of confidence by performing the operations sug¬ 
gested in Table 3, “The 1%, 5%, and 10% confidence values 
of Theta and Omega.” 

Figure I can be converted into a “tailor-made” nomograph 
for one particular study by marking the 1%, 5% and 10% 
confidence values of Omega on the center scale, and by writing 
frequency values corresponding to proportions of Ni and N 2 
along the pi and p 2 scales. 

Summary 

Three instruments, two tables and a nomograph, to be used 
in the evaluation of the significance of the difference between 
two percentages have been offered. The research worker with 
a large number of differences to evaluate can, with three simple 
calculations, determine which of his items attain the i%, 5%, 
or 10% levels of significance. 
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A STUDY OF FAKING ON THE KUDER PREFERENCE 

RECORD 1 


ORRIN H. CROSS 
University of Alabama 

Since vocational guidance counselors and employment offices 
of industrial concerns use it so frequently, the author of the 
present paper felt that the possibility of faking the Kuder 
Preference Record needed investigation. Examination of the 
items of this inventory reveal many which apparently would 
be quite transparent even to the average individual. This pos¬ 
sibility throws some doubt on the advisability of using it except 
when wholehearted cooperation of the subject is assured, 

A recent paper by Longstaff (i) on a similar problem has 
indicated that both the Kuder and the Strong Vocational In¬ 
terest Blank for Men are susceptible to multiple faking, 1. e., 
faking upward on some of the scales and downward on the 
remaining ones. The present study differs from Longstaff's 
in several ways. In the first place, Longstaff's subjects were 
mature students in an evening Extension Division class in 
Vocational Development and Personnel Psychology at the Uni¬ 
versity of Minnesota, presumably somewhat sophisticated in 
psychological test taking; the subjects for this study were drawn 
from a high-school group, probably quite unsophisticated psy¬ 
chologically, Secondly, Longstaff had his subjects attempt mul¬ 
tiple faking, upward on the mechanical, scientific, artistic, 
literary, and musical scales of the Kuder, downward on the 
remainder; the subjects for the present study were instructed to 
fake either up or down on just one scale at a time. Finally, 
check studies were made in the present case to determine 
whether previous acquaintance with the test might have been 

•This research was supported in full by a grant from the Research Committee of 
the University of Alabama. Papers based on part of these data were presented at the 
1949 meetings of the APA and the Southern Society for Philosophy and Psychology. 
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a farter in sucres and also whether differences in age and 
education might have been a factor. 

Procedure 


The construction and standardization of the test is reported 
elsewhere f. 5 'i ami consequently will not he reviewed here. 

Two short preliminary studies were done in a small southern 
hjgh whfjol, both of them on one scale (the Mechanical), with 
one sex fmalc). In the first study, all seventh- and eight- 
semester students who could find time to take the inventory 
were tested. The highest ten males on the Mechanical scale 
were then asked to re-take the test with the instructions to 
fake a low interest in the mechanical field of work. Several 
of the lowest ten also re-took it with instructions to fake high 
interest. Keith groups were successful. 

In the second preliminary study, the procedure was varied 
in that the students were asked to fake a high mechanical 
interest prior to any acquaintanceship with the test. The dif¬ 
ference obtained between this group of j6 boys and that of the 
Kuder norm group of high-school boys proved to be significant 
at the .<?J level (actually there were fewer than 6 chances in 
joo,w> that such a difference could occur by chance). 

The study being reported here used the method of the first 
preliminary study,* i. e.. (i? honest test administered by school 
authorities, fa) selection of high and low scoring individuals 
on each scale, (3) retest with instructions to "fake” an interest 
in the opposite direction.* The subjects had not been informed 
of the results of the honest test. 


*ln order to sseoire the orojwnttion of the high-school authorities, this procedure 
had to be followed. 

■A copy of the directions u> fake follow: 


Directions 

The inventory you did previously was done u a student, a mm-employed stv 'd“ t ’ 
honestly reiving a picture of his own interests. We would now like TOt cooMrenon 
in doing this inventory a* a person who was looking for a special kind 0 ! joe rimt; 
» person who might warn to "fool" the tat. We want you to bmp us find out tnu 
can lw done. 


Directions for Puking Low 

You are now to pretend that your doctor has warned you that to takc~~* wh 
would mean almost emnin death, If you s,how high interest in that kind el wont y° 
will he forced to take it in spite of this fact. , , ,. nfwnrk . 

You muse 'yant" the test results so that you will not have to tako this type 0 * wontj 
so that you show as little interest in—-—’work as you can. Thus you will score toe 
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About six hundred (600) high-school students in four of the 
five high schools of a large southern city took the honest test. 
From this group were selected 364 students for retesting—181 
males and 183 females It seemed desirable to compare within 
the sexes because norms on the sexes differ 

A check study with college students from beginning psy¬ 
chology classes was made, using the method of the second 
preliminary study, i. e., fake test without previous experience 
with the inventory. Means and standard deviations were com¬ 
puted for a comparable college group from the same campus 
for an honest taking of the inventory. Comparisons were then 
made between the two college groups. This group was not 
asked to fake low because it appeared to the author that faking 
low would not be a significant problem in the situations in 
which the results of the research would be useful. In the guid¬ 
ance and industrial situations the peaks of a profile are re¬ 
garded as of positive significance, while the low scores are 
commonly used for their negative value, if at all. 

Finally, a group of 67 college students who had taken the 
inventory were asked to rank the interest fields in order of 
what they thought their test profiles would show. In this part 
of the study, a list of the scales was presented, each one fol¬ 
lowed by a list of from four to fifteen of the occupations listed 
by Kuder in his Manual as being representative of the occupa¬ 
tions in that field of major interest. 

Results and Discussion 

It will be noted from Table 1 that the high-school students 
were quite successful at the assigned task, the probabilities 

item of each trio which appears to you to be least indicative of-* interest m column 

headed "i”, and the one most indicative of-* interest in column “3”, make no mark 

opposite the other activity 


Directions for Faking High 

You are to pretend you very much want a particular job. If you show a large 
amount of-* interest on this test you have it “cinched”. The job does not necessar¬ 
ily involve-* work, but you must show very high interest in such things. 

You must "slant” the test results so that you will appear to have a great deal of 

interest in-* work Thus you will score the one item in each trio which appears 

to you to be most indicative of-* interest in the column headed “1”; and the one 

least indicative of-* interest in the column headed "3”; make no mark opposite 

the other activity 

*The name of the scale being faked was inserted in each of the blanks. At the 
end of the directions was appended a list of the occupations chosen from Kuder's 
lists for the scale being faked. 
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that the observed differences were due to chance being less 
than oi, with the exception of the females faking low on the 
Persuasive scale. On this scale, the probabilities lay between 
.02 and .01. If the results are corrected for a deviant case 4 , 
who actually raised her score, this probability also drops to 
less than oi. Each sex was compared to the Kuder Manual 
norms as a measure of its faking ability. Neither sex failed to 
fake high successfully on any one of the scales On the other 
hand, both sexes failed to successfully fake low on scales four 
(Persuasive) and nine (Clerical), and the females also failed on 
scale two (Computational). If these results are each corrected 
for a single deviant case whose fake score proved to be higher 
rather than lower on the retest, the probabilities drop to less 
than 01 on all the scales except the Persuasive for females. 
This probability is .075. Comparison of the sexes, scale by 
scale, failed to reveal any significant differences between them. 
The “t” values ranged between .090 and 681 for from 13 to 21 
degrees of freedom. 

College males and college females proved about equally ex¬ 
cellent at faking high when compared to the norms derived in 
this study. For the male group the “t” values ranged between 
3,93 and 25.41 (median = 12.09) f° r degrees of freedom be¬ 
tween 63 and 66 for the various scales. For the female group the 
“t” values fell between 6,49 and 26.88 (median = 10.04) for 
degrees of freedom between 128 and 134. Comparison of the 
sexes revealed significant differences between them on the Per¬ 
suasive scale only, with the males showing superiority there. 
The “t” value here was 6 353 for 21 degrees of freedom. 

Finally, the high-school and college groups were compared. 
No significant differences were evident here. The “t” values 
obtained fell between 2.04 (for 14 degrees of freedom) and 
.049 for from 14 to 21 degrees of freedom. 

Inspection of the pertinent data (Table 1) indicates that 
faking high is easier than faking low The male sex faked high 
better than low on all but the Mechanical scale, the female sex 

4 The author assumed that when a subject instructed to fake_ low not only failed 
to lower his score on the retest, but raised it, he was not following directions either 
because he misunderstood or because he did not wish to cooperate Critical ratios, 
except where noted, were calculated with such deviant eases included. 
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faked high better than l*»w <>n all but the Scientific, Social 
Service, and Clerical scales. Faking high is a more important 
ability in the situations in which the test is applied, conse- 
sequent!)* this finding appears pertinent. 

Faking high appears to be somewhat easier for the college 
group than for the high school group although the difference 
does not reach the :M level of significance on any scale for 
either .sex. Differences favoring the high-school groups occurred 
on the Computational and Artistic scales for the male sex, and 
for the Persuasive and Artistic scales for the female sex. 

The mean rank difference correlation of the college group 
which attempted to predict what its order of standing on the 
nine scales would be was +.67. This coefficient is significant at 
the .01 level. 

The uses to which sue h an inventory is put need examination. 
First, it is used in vocational ami educational guidance in 
public schools, colleges, guidance centers, and the employment 
services; and, second, it is used in the selection of workers for a 
job in business and industry. 

What do the results reported mean, then? In the first case, 
guidance, rapport may Iw justifiably assumed. In this case, 
such a f&kablc inventory retains usefulness. However, in the 
second instance no such assurance of cooperation exists. On 
the contrary, it might reasonably be assumed that testees 
take the opposite attitude, consciously or unconsciously. On 
the basis of the reported results, it might Ik asserted that such 
an inventory as this must be interpreted with caution in the 
industrial situation, at least where the applicant has any ink¬ 
ling of the job he is being considered for, 

Longstaff (1) has suggested the analysis of this inventory 
after the fashion of the " K scale 1 ' of the Minnesota Multiphask 
Personality Inventory to seek a correction for faking. That was 
the original intent of the study here reported, but analysis of 
some of the data obtained failed to reveal enough items for 
such a scale, Another possibility in selecting workers for a job 
might be the use of the whole profile, on the assumption that 
secondary (and less transparent) peaks, and low scores might 
prove tx> be discriminating. 
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Conclusions 

The results reported above appear quite consistent in their 
implications. In only one case did a group (in this study, the 
high-school females, faking low on the Persuasive scale) fail 
to perform the task successfully as compared to its own honest 
tests. Correction of this result for a deviant case brought that 
result to significance also. It thus appears that a subject suit¬ 
ably motivated may successfully fake the Kuder Preference 
Record 

As shown by the present study, when an applicant for a job 
has any idea of what job he is being considered for, his scores 
should be interpreted in the light of the knowledge that faking 
is possible if he desires to fake. In the properly motivated guid¬ 
ance situation, this problem does not arise. 
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PSY CHOI XXJI CAL TESTING FOR IMMIGRANTS IN A 
VOCATIONAL COUNSELING AGENCY' 

BENJAMIN BAUNSKY 

United Srnote for New Americana and City College of New York 

Tur. Vocational Services Department of the United Service 
for New Americans aids recent immigrants to achieve voca¬ 
tional adjustment. There is no established testing program, but 
outside testing facilities have been utilized on occasion. The 
question arose about whether or not to increase the utilization 
of tests for the recent immigrants. Ordinarily, the matter would 
have been directly answered on the basis of precedent that tests 
are widely accepted by counseling agencies. However, since the 
Vocational Services Department has recent immigrant clients, 
the matter of testing them more regularly was enmeshed in the 
broader problems of test validity and interpretation. 

Design of Study 

It is known that the tests employed in counseling have been 
standardized on American-speaking and American-acculturated 
populations. The recent immigrants not only do not understand 
the American language idiom well but have had extraordinary 
personal experiences that make a testing program for them one 
that requires careful study, It was decided that the design of 
the study have two phases; 

i. To discover the particular needs of the clients and the 
counselors who serve them. 

a. To try out various psychological tests and techniques. 

The first task has been completed. The second phase has only 
begun. The first phase was accomplished by means of the 
following; 

'This paper is adapted from a report read at (he Jewish Occupational Council 
Em wm Regional Confer era. Felmnrv t«. top). The writer wishes to express hts 
sincere appreciation to Mr, Wilimii K.i;p, Director, Vocational Services Department, 
far hts invaluable aid in .t.rt'.ng .nul » irking through the Psychological Services 
Program. 
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i. Conferred with supervisors and counselors on client needs 
and their own. 

а. Observed interviews. 

3. Interviewed directly, especially the more difficult clients. 

4. Attended and participated in administrative staff meet¬ 
ings. 

5. Studied case records. 

б. Conferred with representatives of the Family Service De¬ 
partment because of the close working relationship with 
the Vocational Services Department. 

The second phase will be accomplished by cooperating with 
outside testing facilities where the immigrants will be examined. 
The test results will be studied against interview, case history 
and vocational data, and the test battery modified from time to 
time as the results merit. 

Immigrants as Special Problems 

The question may be raised as to whether or not the recent 
immigrants present testing problems different from the usual 
client. From phase one of the study it was learned that: 

1. The recent immigrant is generally older-—425% were 40 
years of age or older, 30% were 45 years of age or older; 
22% were 50 years of age or older. 

2. 82% had been in this country one year or less. 

3. The largest number of applicants had been in business, 
salesmen, or office workers m Europe. 

4. 11% were handicapped by general health or a specific phy¬ 
sical impairment 

5. 7 0% were on relief and over 95% were known to the Family 
Service Department at some time 

6. Almost all the recent immigrants had more or less difficult 
social and personal adjustments to make at the same time 
they were making a vocational adjustment. 

7. Counselors wanted help with understanding the personality 
of the particular immigrants They felt that the immigrant 
clients were more difficult to understand than the American 
clients with whom they were familiar 

8. Counselors indicated that a routine interpretation of tests 
was of little value. 

These findings put the immigrant in a special class that may 
well be compared to the so-called handicapped groups. Just as 
one does not proceed on the same basis with the handicapped as 
with the non-handicapped, the same cautions must be exercised 
with the recent immigrant. One would not rely upon oral tests 
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for the deaf or written rests for the blind, and one must seek 
test* that arc more valid fur the immigrants. 

Testing the Immigrant 

In testing the immigrant, we run into the issue of mechanical 
or dynamic testing and interpretation. Psychological te9tshave 
been generally accepted as part of the total process of voca- 
tiun.il counseling. However, the use made of tests varies con- 
sidcrabiy from more or less routine mechanical interpretation 
of the results to the dynamic interpretation where predictions 
are based on all there is known about measurement principles 
the particular test and the particular individual being tested' 
Where the individual has only a vocational problem uncompli¬ 
cated by difficult social and personal adjustments and where the 
individual has had normal opportunities for development, a 
prediction based upon the specific test results will probably be 
valid. However, where this is not so, as in the instance of recent 
immigrants, then the prediction must be based on more factors 
than the specific test results. 

Apropos of this issue I refer to the case of Hans K., as re¬ 
ported in the Jewish Occupation Council, Program and In¬ 
formation Service, Release Hans was an immigrant 

about 24 years of age. Tests had been given him twice. The first 
time they pointed up the need for psychiatric referral.Thesecond 
time tests were given the statement was made that, "his general 
pattern of abilities has not changed and he has not improved 
much in abili ties where learning power is involved, such as vocab¬ 
ulary,” 1 The statement continues, "on the basis of these test re¬ 
sults it apf reared that Hans could not profit from formal training, 
An occupation requiring either gross or precisemanual dexterity, 
speed and some accuracy would be most suitable for him.” 

However, as a result of the psychiatrist’s statement that 
Hans might react with a neurotic or psychotic breakdown "if 
he could not anticipate his unrealistic vocational aspirations,” 
the results were reviewed. It was decided that typing and book¬ 
keeping might not be contra-indicated, Hans went on to make 
a good adjustment in office work and even successfully accom¬ 
plished some part-time college work. 

Here predictions of success were based upon specific test re* 
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suits without fully taking into consideration the language bar¬ 
rier and the emotional complications. Hans had rated an IQ 
of 137 on the Performance part of the Wechsler-Bellevue Intelli¬ 
gence Scale and an IQ of in on the Full Scale but only an IQ of 
85 on the Verbal Scale, this test having been part of the battery 
given previously. An IQ of 137 on the Performance part shows 
a very high level of intelligence and indicated that the low 
Verbal IQ is most likely not permanent but very probably 
temporarily depressed It should also have been known that 
Hans was interested in office work. Considering all the test re¬ 
sults and what was known about Hans, the recommendation 
for an occupation requiring either gross or precise manual dex¬ 
terity seems like clutching at straws. There seemed to be in¬ 
complete evaluation of the test results, especially in terms of 
Hans’ particular background and personality The need for a 
more dynamic interpretation was sharpened by the fact that 
Hans was an immigrant with emotional difficulties. 

It has often been remarked that the immigrant does not have 
different problems, but rather more of the same that every one 
else has. But more of the same, a quantitative difference, makes 
eventually for a qualitative change. A person may be anxious 
upon occasion, but another may be always anxious. This quan¬ 
titative difference makes for a different style of life, different 
kinds of adjustment. It calls for recognition by tests and by the 
counselor in evaluation and adjustment. Water will still be 
water at 99 degrees C. but at 100 degrees it will be steam and 
the properties will change. This qualitative change demands 
different handling. So it is with the recent immigrant. He may 
have a little more or much more anxiety, suffer more from the 
difference between his expectations and reality, have a greater 
tendency to conflict between the need to be independent and 
dependent. But because he has more, his problem is not only 
greater but different. He has less language facility, his home sit¬ 
uation is less favorable, he has had fewer opportunities to make 
adjustments on his own here and to see their results. Because of 
this, also, he reacts differently. 

Greater attention must be paid to the whole person in testing 
and evaluating him. We are making predictions on the basis of 
test results and these predictions must be based on all the evi- 
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dence. It K ncccisjtry tn take into account: (t) theories 
principles of measurement, (a) the standardization of the tests 
themselves and rjj the particular person being tested. 

One of the major theories in measurement that is significantly 
related to the resting of the immigrant is that of the effect of the 
environment on prevent test abilities. There is ample evidence 
both experimental ami clinical, which demonstrates that an 
environment different from that of the group upon which the 
rest was standardized will lead to results that require explana¬ 
tion. Since we are to predict the probability of adjustment to 
new situations wc must include the possibility of accelerated 
growth when in a new environment, especially one that is more 
favorable for growth in the expected direction. Specifically, for 
the immigrant, this, means wc must be able to predict his ability 
level aftet a period of time. After a while when he becomes more 
accustomed to American ways, feels more secure and under¬ 
stands the language better, his actual test results may rise. 
Wc must be able to predict the approximate rise in the present 
test results. And this we cannot do unless we take into account 
all factors about the tests and the individual. 

It is necessary to know the validity, reliability and the norm 
populations for each test in order to interpret the results on 
the tests. Validity is most imjmrtant since, if a test does not 
measure what if is supposed to, it matters not how consistently 
it measures something else or from what population the norms 
were derived. Moreover, the validity of tests is not so high as 
to measure with infinitesimal error. Most intelligence tests have 
validity coefficients of from about .Ho to .90 and aptitude test 
validity coefficients approximate ,60 as the modal instance. 
This means that the error of prediction may be quite large for 
any one individual. This error can be reduced by studying all of 
the test results in terms of the individual's present state and 
background. 

When it comes to aptitude tests where the validity based on 
groups is usually only about .60, the cautions in making pre¬ 
dictions for an individual must be even greater. It may be 
necessary to add more tests to get at the patterns. It is im¬ 
portant to make observations of the individual while at work 
on the tests. It is valuable to know about interests and efc 
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periences of the individual. Only then can the predictions begin 
to approach significance. 

Kinds of ‘Tests 

From present observations of the immigrant as a test¬ 
ing problem it seems that there are sufficient tests already 
available that are adaptable for the immigrants. It is not neces¬ 
sary to make new tests. Performance tests of aptitude can be 
administered with little difficulty. The language factor is mini¬ 
mized, the cultural factor is lessened and observations of 
method and behavior can be made to illumine the test results. 
We have a little experience already with some tests Some of 
our clients were examined at the YMCA Vocational Service 
Center. We found that the usual paper-and-pencil mechanical 
aptitude tests did not give as valuable information in filling out 
the interview data as the performance type of test and those 
which measured abilities more specifically like the Minnesota 
Paper Form-Board The performance tests which seemed good 
were the Minnesota Spatial Relations Tests, Formboards did B, 
the Finger and Tweezer Dexterity Tests, the Purdue Pegboard 
and the Placing and Turning Tests. The Wechsler-Bellevue In¬ 
telligence Scale can be used effectively if the Verbal Part is 
properly evaluated 

The language factor is not as important as is the cultural 
For instance, a direct translation of the Wechsler-Bellevue 
will still have peculiarly American items like George Washing¬ 
ton’s birthday, the height of the average American woman, 
some of the Picture Arrangement items and Picture Comple¬ 
tion items The paper-and-pencil mechanical aptitude tests 
have many items strangely unfamiliar, not only to immigrants, 
but to many of us. These kinds of tests are contra-indicated. 

Clerical tests, like the Minnesota Clerical lest, may be ad¬ 
ministered to those immigrants who have interest in clerical 
work and are able to read and write English. The Kuder Pref¬ 
erence Record seems preferable to the Strong Vocational Interest 
Blank for our groups. 

For personality description, the projective tests, like the 
Rorschach, would be possible. The Rorschach has been success¬ 
fully used for diagnostic purposes with immigrants. There is 



EW'CA ridK.A l. AND PS.VCHO LOGICAL MEASUREMENT 

home question as to its use for vocational purposes; that is in 
terms of obtaining behavioral descriptions that would predict 
how a person would function in different work conditions This 
use of the Rorschach deserves much more research. In fact the 
Vocational Services Department is contemplating the use of 
several projective techniques to get at the personality 
attributes and m indicate their functioning in terms of voca¬ 
tional goals. 

The present norms on the tests can he used while immigrant 
norms arc being developed. The immigrant norms, however 
wifi have to be validated against the degree of success in train¬ 
ing or at work. In this way scores on the immigrant norms can 
he related to the standard norms. The establishment of immi¬ 
grant norms should be used as a statistic to improve the ac¬ 
curacy of prediction. But it cannot take the place of the holistic 
or clinical evaluation and interpretation of the test results. 
Finally, the test results need to be carefully integrated with 
the subsequent interviews by counselors. 



AN INVESTIGATION OF THE PERSONALITY TRAITS 
OF ART STUDENTS 1 

MARTIN SPIAGGIA 
City College Vocational Advisement Unit 

Introduction 

Many opinions have been voiced concerning the nature of 
artistic persons. Predominant among these is the belief that 
artists are emotionally unstable. Lombroso, a nineteenth cen¬ 
tury psychiatrist, is cited by Rank (2.7) to have advanced a 
theory on the “insanity of genius" which treated features de¬ 
parting from the normal as “pathological." Psychoanalysis 
also, as Rank shows, has tended either to identify the artist 
with the neurotic—particularly in Sadger’s and Stekel’s argu¬ 
ments—or to explain the artist on the basis of inferiority feel¬ 
ings, as in Adler’s school of thought. 

Whether there is any factual basis in this “abnormal” point 
of view, or whether it has been merely a manifestation of the 
universal tendency to ascribe weakness and idiosyncracy to the 
highly gifted, has not yet been experimentally determined. 
The bulk of published psychological experimentation in this 
realm concerns itself with the relationships between artistic 
ability and such factors as intelligence (3, 33), perceptual fa¬ 
cility (15), and creative imagination (17, 18, 19). Little work 
has been done, however, in studying the personality of the 
artist. Previous studies which appear relevant to the research 
at hand are described briefly below. 

Data gathered on several hundred college students at the 
University of Minnesota (2) indicated no significant relation¬ 
ship between ability in art and introversion, submissiveness, 
or emotional instability. The Bathurst Diagnostic Temperament 
Test and the Bernreuter Personality Inventory were used; ability 

1 This study was submitted in partial fulfillment of the requirements for the degree 
of Master of Arts at New YorkUniversity.For valuable help and criticism, the writer 
is indebted to Dr. Naomi Stewart, who sponsored the study 

a8y 
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m art was measured by the Meter-Seashore Art Judgment Test 
and the MeJdnry Art Test, supplemented by the judgment of 
instructors. 

In a study by Fleming of 84 girls at the Horace Mann School 
for Crirls Mr) rating scales were used for determining who the 
'‘artistic" girls were. These ratings were correlated with teach¬ 
ers’ estimates of various personality traits. The coffieiency of 
contingent y between "talented in sortie field of art” and “per¬ 
sonality." as rated by the teachers, was found to be ,15, 0 n 
the basis of widt h Fleming argues for a “definite tendency for 
those with artistic talent to possess what is commonly called 
personality." No explanation is given of what is "commonly 
called personality." 

Prados, using the Rorschach, found the following common 
features among 20 professional artists (26): superior intelli¬ 
gence, tear of mediocrity and disregard for the routine problems 
of everyday life, strong drive for achievement and richness of 
the inner interests, and pronounced sensitiveness and emo¬ 
tional responsiveness to the outer world along with a lack of 
adaptability to it. the last mentioned feature tending to be 
counterbalanced by sound intellectual control. 

Roe, in a study of 20 prominent American male painters 
(28, 29), found them to be sensitive, non-aggressive, emotionally 
passive, hard working, self-disciplined, and of superior intel¬ 
ligence. She found nothing in the personalities or intellectual 
powers of her subjects, as measured by the Rorschach and 
Thematic Apperception tests, that was radically different in a 
qualitative sense from those of other people. She found, how¬ 
ever, that the type of social and sexual adaptation was of a 
markedly non-aggressive sort and hence rather more* 1 feminine” 
than ‘'masculine’’ according to our cultural stereotypes. 

The object of the present study was to investigate differences 
irt personality traits, as measured by the nine scales of the 
Minnesota Multi phasic Personality Inventory , between art stu¬ 
dents and non-art students matched with them on age and 
intelligence. 

Population 

Art Students-’-- The subjects, all volunteers, were 5 ° ma ^ e 
art students, age 18 or above, who had attended a recognized 
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art school in New York City (excluding commercial-art schools) 
for at least two years, and who intended to make art work their 
vocation. 

Control Group of Non-Art Students —The control group was 
composed of 50 male subjects who were not art students, and 
were selected randomly from the general population in New 
York City, and in Rockland and Orange Counties of New York 
State. Some of the occupations included were hospital atten¬ 
dant, automobile mechanic, electrician, shoemaker, chauffeur, 
teacher, accountant, and graduate student. 

TABLE 1 

Comparison of Minnesota Mulltphastc Personality Inventory Results Obtained on 50 
Art Students and 30 Non-Art-Student Controls Matched on Age and Otis I Si 


Du 

Art Student 
Mean- 

Art Students Controls Control 

Variable Mean SD Mean SD Mean t ratio 


Age (years last birthday) . . 24.64 6.16 24 62 5 47 02 — — 

Otis IQ . .in 5 6 10 49 112 68 10.82 — .33 — — 

Hypochondriasis . 32 56 10 06 51.66 9 00 + 90 1.93 .47 

Depression . . ,. 56 74 10 79 53 16 5 28 -f 3.58 1 66 2 15* * 

Hysteria ... . 59.24 7.91 57.74 7 43 + r 5 ° 1-51 99 

Psychopathic Deviate .. .. 59 14 13 40 50 28 4 96 + 8.86 1.97 4 49 

Interest . . , 70.10 11.82 55.92 5 86 +14 18 1.86 7 62” 

Paranoia .... ... , 54 00 7.17 47.18 5.91 + 6.82 1 36 5 oi" 

Psychasthema . , 53 88 10.26 50.45 6.54 -j- 3.43 1.67 a.05* 

Schizophrenia . 55 . 9 ° 10 74 49.46 4 71 + 6 44 1.61 4.00J 

Hypomania. 61.38 10,93 53.32 5.56 -j- 8.06 I 72 4.697 


* Significant at the 5% level of confidence 
f Significant at the 1% level of confidence. 


Testing was conducted at the Psychology Laboratory of New 
York University between July, 1947, and July, 1948. 

Each art student was matched with a control on the basis of 
chronological age and Otis IQ (25): within 3 points on age and 
5 points on IQ. The mean age for Art Students and Controls 
was 24 6 ; the sigma on age was 6.2 for Art Students and 5 5 
for Control. On Otis IQ., the mean and sigma for Art Students 
were 111.6 and 10.5; the Otis IQ. mean and sigma for Controls 
were 112.7 and 10.8, 

Procedure 

Raw scores for the nine scales of the Minnesota Multiphasic 
Personality Inventory (10, 11, 12, 13, 14) were obtained for. 
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each subject. Standard score equivalents, or T-scores 
determined, full account being taken of the suppi em e n T re 
scores, that is, the Lie, Question, and Validity scores. ^ 

On each of the nine Multiphasic scales the difference ' 
standard score was obtained for each art student and his non 
art-student control matched for age and IQ. These T-score 
differences were distributed and the mean and sigma of each 
distribution of differences obtained. 

An estimate of the standard error of the mean of each set of 
differences was then computed by dividing the standard devia¬ 
tion of each distribution of differences by the square root of 
N-t, thus allowing for the correlation in scores for the two 
groups introduced by the matching. A t-ratio was then com¬ 
puted for each variable. 

Resubs and Discussion 

Table l gives all pertinent data. As can he seen from this 
table, the art students were significantly higher than the con¬ 
trols in mean scores on the Depression, Psychopathic Deviate, 
Interest, Paranoia, Psychasthcnia, Schizophrenia, and Hypo- 
mania Scales of the Minnesota Multiphasic Personality Inven¬ 
tory, These differences were significant at the one per cent 
level for ail scales mentioned except the Psychasthenia and 
Depression Scales, where the differences were significant at the 
S per cent level. 

If we can safely generalize from the findings of the present 
paper, these results suggest that the art student, as compared 
to the non-art student of similar age and intelligence, is more 
typically introverted, exhibits a greater tendency toward de¬ 
pression, possesses a tendency to disregard social mores or an 
inability to adjust to the outer world, and is more feminine 
in his basic interest pattern. Further, he tends toward over- 
productivity in thought and action, these being of unusual 
character, and also toward compulsive behavior. 

Several factors must, however, be considered in interpreting 
these findings. Concerning the Interest Scale, on which the aft 
students were found to score significantly higher, we must take 
heed of the caution by the authors that homosexuality must 
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not be assumed on the basis of a high score without confirmatory 
evidence, owing to the relatively low reliability of this scale. 
Burton (1) administered the Interest Scale to no rapists, 34 
sexual inverts and 84 other delinquents Although he found 
significant differences between inverts and lapists, and also 
between inverts and delinquents who were sexually normal, 
on retest of 34 cases the reliability coefficient was found to be 
only .70. 

The fact that Interest scores are related to cultural factors 
(31) must also be taken into account m interpreting the In¬ 
terest findings. Roe, for example, in the study previously men¬ 
tioned (28, 29), interprets the “feminine” type of sexual adap¬ 
tation of a group of male artists as reflecting the attempt on the 
part of our society to maintain one acceptable male stereo- 
type. 

The high mean on the Paranoia Scale would seem, however, 
to add weight to the significance of the high mean on the In¬ 
terest Scale, in light of current psychoanalytic theory which 
stresses the partial failure of repression of homosexual tenden¬ 
cies in the psychogenesis of paranoia (4) Ferenczi (5) goes so 
far as to consider paranoia as distorted homosexuality. Hender¬ 
son and Gillespie (13), however, describe eleven cases of par¬ 
anoia in only four of which the etiology of the paranoia was in 
agreement with Freudian conceptions. They claim that the 
causation of paranoid conditions is probably not by any means 
uniform, but that type of personality is one of the commonest 
predisposing elements. The sensitive, introverted individual, 
such as was found common in the art-student group, is men¬ 
tioned as one of the types particularly susceptible to paranoia. 

The significantly high scores of the art students on the 
Schizophrenia and Psychasthenia Scales appear readily inter¬ 
pretable. It would seem likely that by virtue of his higher 
“cultural” level, the art student encounters difficulty in ad¬ 
justing to the outer world and finds it psychologically necessary 
to turn inward, appearing introverted, and giving rise to the 
high Schizophrenia mean. The tendency toward compulsive 
behavior, shown by the art students on the Psychastenia Scale, 
is due in part to the overlapping of items and the high corre- 
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}at»*n between the Srhi/''phrrnia and the Psychasthenia Scales 
{$4 \‘4 jjnfn3.sK, .7* for abnormal rases). It may also reflect 
A real tenderu y toward »omp»Kive behavior on the part of the 

,srr student igoup. 

The high mean wore for the art students on the Hypomania 
Stale may appear im on d stent with the high mean for this 
group on the Drprcvdnn Stair, and with the introvertive pat¬ 
tern whuh srrjjjs f«* typify the group. It must be remembered, 
however, that the llvpomaiiu Scale presumably measures over- 
pr»nluctivity of ffought as well as action. It seems plausible 
that the high Hyj T om.«ni.i mean for the art students is account¬ 
able in trims of overproduemity of thought', that because of 
their irmoverrive tendency they express these thoughts in sym- 
}xdh forms rather than in action in the ordinary sense of the 
*A or<l, 

The relatively high mean score of the art-student group on 
the psychopathic Deviate Scale is contrary to expectations, in 
IKJtr of the inn over the pattern manifested for this group, 
since behavior of the psychopathic deviate variety is usually 
„iv»otSated with extroverdve tendencies. The relatively high 
Psychopathic Deviate mean score is, however, consistent with 
the ordinary layman’s stcrcriiypc of the artist. It must also 
lw cotiMdeted that while the Multiphasie appears adequate for 
giving a general over all pattern of group behavior, it loses 
in validity when an attempt is made to interpret findings on any 
given scale taken in isolation. 

Further caution is prescribes! in interpreting the results dis¬ 
cussed here. Owing to the preliminary status of some of the 
Multiphasie Scales, cite overlapping of items among the various 
scales, the lack of experimental determination of reliabilities, 
the Multiphasie is still in an incomplete state of develop- 
ment. 

Note must also he taken of the limitations of the present 
study with respect to sampling. The art students were all from 
professional art schools in New York City and cannot be taken 
to represent art students throughout the country, fhe number 
of cases, while sufficient to yield statistically significant di - 
ferences for many of the comparisons made, is also very small, 
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in an absolute sense, the differences, therefore, while signifi¬ 
cant, are not highly reliable. 

Summary 

Differences in personality traits between art students and 
non-art students matched with them on age and intelligence, 
as measured by the nine scales of the Minnesota Multiphasic 
Personality Inventory , have been investigated The findings 
reveal significantly higher mean scores for the art students on 
the Depression, Psychopathic Deviate, Interest, Paranoia, 
Psychasthenia, Schizophrenia and Hypomama Scales of the 
Multiphasic. These findings seem, on the whole, to be psycho¬ 
logically meaningful. 

Owing to the selective character of the sample used and to 
the inadequacies of the Multiphasic as a tool for personality 
diagnosis, caution is indicated in interpreting these results 

Further study of the personality characteristics of different 
vocational and social groups is recommended. On the basis of 
the present findings, it would seem that investigations along 
such lines can afford material aid to the understanding of 
various social problems 
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THE KNOWLEDGE OF GENERAL EDUCATION OF A 
SAM PI EOF SYRACUSE I UNIVERSITY STUDENTS As 
REVEALED HY THE C(X)PERATIVE GENERAL CUT 
TURK TEST AND THE TIME MAGAZINE CURRENT 
AFFAIRS TEST, 


N M. DOWN IE 

Thr Ssair of Washington 

M E TROY EH and C, R. PACE 
Syracuw University 

Dcrino the academic year, 1947 0948, Syracuse University 
initiated an all-university self-survey, the results of which were 
to provide the bases fur enlightened planning for the years 
ahead. Among the concerns of the various survey committees 
was an investigation of the program of general education of 
the University. 

At a parr of this study of general education, a sampling of 
seniors, members of the Class of 1948, and of sophomores, mem¬ 
bers of the Class of 1950, were given Form X of the Coopera¬ 
tive. General Cull are ‘Test and the September, 1947, edition of the 
Time Magazine Current Jfairs Test. These tests were admin¬ 
istered late in December of 1947 and during the first school 
days of January, 1948. The following five Colleges of the Uni¬ 
versity had students participating in the program: Applied 
Science, Business Administration, Fine Arts, Home Economics 
and Liberal Arts. 

Raw test scores on the Ohio Psychological Examination , Form 
ll, were obtained for as many students as possible. Mean scores 
on this test were computed for each college and class, These 
means were tested for significant differences by means of the 
"t" test and the homogeneity of their variances by the “R" 
test. No significant difference was found between the mean 
score for all of the seniors and the mean score for all of the 
sophomores. When the mean score for each college was colts- 
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pared with the mean of the total seniors and with that of the 
total sophomores, the only significant difference was found to 
be that the mean score of the Liberal Arts sophomores was sig¬ 
nificantly higher than that of all other sophomores 

An analysis of covariance technique was applied to the data 
to determine whether, if intelligence test scores were held con¬ 
stant, there was any difference between the total scores of the 
seniors and sophomores on the Cooperative General Culture Test. 
An “F” ratio of .784 was obtained. This led to the acceptance 
of the null hypothesis that there was no significant difference 
between the means of the two classes on the total scores of this 
test. The Welch-Nayer 'Test was used to check the assumption 
of homogeneity of variances of the two groups. The variances 
were found to be homogeneous. 

When the test was analyzed by subtests and for the differ¬ 
ent classes in the five colleges, numerous significant differences 
appeared as shown in Table i. The“t” test was used to test the 
significance of the differences between the means. Variances 
of each set of means were compared, using the“F” ratio. It was 
found in two cases where significant “t’s” appeared—Current 
Social Problems, for the Applied Science and Home Economics 
students—that the real difference was caused by the variances 
of the two distributions. 

Table 1 shows the mean total and part scores by college and 
class on this test. On studying this table, one sees that, in gen¬ 
eral, the students of Syracuse University achieved well above 
the mean on national norms for college sophomores. As a matter 
of fact, of the eighty-four mean scores reported in this table, 
only ten are below the mean on national sophomore norms. 

On studying the six part-scores of the test for each college, 
one sees that in the College of Liberal Arts both seniors and 
sophomores were well above the all-university mean for each 
part, with the seniors and sophomores significantly different 
from it on Current Social Problems, History and Social Studies 
and Literature, and the seniors in Science The seniors in Ap¬ 
plied Science achieved significantly above the all-university 
mean in Science, Mathematics and Current Social Problems, 
and significantly below it in Literature and Fine Arts. The 
Applied Science sophomores were above the mean in Science 
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and Mathematics ^significantly nH, while on the other areas of 
the test, they approximated if. 

The seniors in the College of Business Administration fell 
sijtmficantly Ik low the tnc.m in Literature, Science and Fine 
Arts and hovered around it tn other areas of the test. Thesoph- 
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omores in the same college were significantly below the mean 
in the same three areas plus Mathematics and significant y 
above it in Current Social Problems, 

Both classes in the College of Line Arts were significantly 
below the all-university mean on Current Social Iro ems. 
History and Social Studies, Science and Mathematics, aroun 
the mean in Literature and significantly above it in rme r s, 
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In the College of Home Economics, both classes were signifi¬ 
cantly below the all-university mean on Current Social Prob¬ 
lems, History and Social Studies and Mathematics, the seniors 
significantly below it in Literature and both classes around the 
mean in the other areas. 

When seniors and sophomores were compared, a few inter¬ 
class differences appeared. The sophomores as a group scored 
significantly higher than the seniors in Mathematics and lower 
in Fine Arts, Literature and Current Social Problems. In the 
Colleges of Liberal Arts and Fine Arts, the seniors were signifi¬ 
cantly higher in Literature and Fine Arts and the Business 
Administration seniors in Fine Arts 

An item analysis of this test was made, using all of the papers 
to determine the percentage of students in each class of the 
five colleges who responded to each item correctly. On Part I, 
Current Social Problems, the students as a whole did rather 
well. They were best informed on items concerned with labor 
unions and labor activities. Other attempts to classify the items 
failed to show any particular area that was either very good or 
very poor. 

Some of the Current Social Problems items, on which the 
students did rather poorly, are listed below. In selecting from 
teachers, farmers, industrial workers, white-collar workers and 
civil service employees, the group least likely to suffer from in¬ 
flation, less than 50 per cent of the students chose the correct 
answer. An item which called for the knowledge that the doc¬ 
trine of states’ rights was used as an argument against 
federal antilynching legislation was known by less than 30 
per cent of the students. Two other items, the period of life 
when the greatest incidence of tuberculosis occurs and the 
meaning of the term “Nisei” were likewise unknown to 5 0 per 
cent of the students. Another item which called for a definition 
of “nationalization of industry” was answered correctly by 
less than 20 per cent of the students. 

A comparison of the results of the item analysis of Part II 
of this test, History and Social Studies, showed that on this 
part of the test, as on Part I, the students as a whole did quite 
well. As might be expected, items concerned with American 
history were easier than those related to European or Asiatic 
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history. Items related r« psychology were answered very well 
hy all of the students. 

Some interesting things appeared when individual items were 
studied On one stem, the student selected from the following— 
aristocratic. autonomous, autocratic, autarchic and anarchistic 
• the one that best described a government controlled by the 
will of one man About per cent of the students answered 
this item correctly. The item which asked whether the elec¬ 
tion of Harding was a repudiation of the Republican Party, 

Ku Klux Rian, Hctinas? Catholic C burch, capitalist system or 
the league of Nations was likewise missed hy about one-half 
of the students. The concept rtf “Balance of Power’ 1 was also 
unknown to .dxuit the same number of students. Perhaps one 
of the most elementary things missed on this part of the test 
was the type of gm eminent in existence in Switzerland. 
Twenty-five per cent of the students answered this item in¬ 
correctly. 

On Part Ill of the test. Literature, a study of the item analy¬ 
sis showed that, in general, the students did poorly. Many 
students omitted a large number of items. There was evidence, 
however, that most of the items were attempted by most of 
the students because several items toward the end of the sec¬ 
tion were answered correctly by nearly every student. 

An attempt was made to see if there were specific areas such 
as American literature, English literature, poetry or drama in 
which the students did better than in others. A comparison of 
the items related to American literature with those concerned 
with English literature showed that the students did about the 
same in both areas. Results on items related to Graeco-Roman 
literature were quite poor for all groups. When the items were 
studied as to whether they pertained to poetry, drama, ex¬ 
position, etc., no evidence was found to show' that the students 
did better in one of these areas than in another. 

One rather interesting thing did appear from the item anay- 
sis. Included in the ninety items on literature were ten items 
which referred to Biblical characters or situations. Of these 
ten items, the students answered only two of them well, Prac¬ 
tically everyone knew that Samson was distinguished ot 9 
strength and that the walls of Jericho came down on the b ovt- 
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mg of trumpets Less than a quarter of the students in all of the 
colleges knew Lazarus as a beggar. About 35 per cent of all 
students were aware that St. Paul was converted to Christianity 
on the road to Damascus and the same percentage knew that 
Lot was rescued from the destruction of his city. The hand¬ 
writing on the wall was recognized as occurring in the Court 
of Belshazzar by about 30 per cent of the students; the father 
and son relationship of David and Solomon was known by about 
35 per cent of the students, and the fact that Joseph was sold 
as a slave to the Egyptians was common knowledge to only 
about 55 per cent of the students. On all of these items, the 
Liberal Arts students did only slightly better than the students 
in other colleges. 

On the recognition of authors, the item analysis revealed 
that the students were very well acquainted with O Henry, 
Pearl Buck, Rudyard Kipling, Robert L. Stevenson, Washing¬ 
ton Irving and Booth Tarkington; but the following authors, 
whose writings are more difficult and more provocative of 
thought, were quite unfamiliar to the majority of students— 
Thomas Mann, Sholem Asch, Andre Maurois, Thomas Wolfe 
and Aldous Huxley 

Several members of the English Department looked over this 
part of the test in order to judge whether each item was some¬ 
thing that a generally educated person should know. Of the 
ninety items, only seven were considered as being of too tech¬ 
nical a nature. 

A study of the item analysis of Part IV, Science, showed that, 
except for the College of Applied Science, students were rather 
poorly informed about general science. Of the sixty items com¬ 
posing this part of the test, only three stood out as being known 
by almost all of the students. These items were concerned with 
the recognition of the metallic element found in the red color¬ 
ing matter of blood, the tarnishing of silver as an example of 
oxidation and the major purpose of scientific investigation. 
One item which asked the students to select from the following 
the one that is not a science—organic chemistry, astronomy, 
bacteriology, geology and astrology—was answered correctly 
by as few as 60 per cent of the seniors in the College of Business 
Administration. Sophomores in the same college and students 
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in hot h i Luv‘- <4 the C*jtllrj*.e> of Fine Arts and Home Econom' 
did *mly sightly better on the same item. An item concerned 
with the Mine fare so animals that produces eggs was missed 
tty almost i'j prr »ent of all students. The name of the e 
given off by a }H*orly damped furnace was unknown to 50 to 
25 per rent of the students in all of the colleges except Applied 
Science Two items related to the use of the scientific method 
wet r rather well done by all of the students except those in the 
("«41c«c of Fsnr \m An item on the cause of the formation of 
dew at night and one on the time /ones of the United States 
were responded to correctly by about bo per cent of all students. 
The law of moments was applied correctly to a problem by 50 
per cent of the Liberal Arts students and by from 40 to Jo 
per cent of those in Business Administration, Fine Arts and 
Home Economics, The item which stated that osmosis is a 
process of oxidation, diffusion, absorption, reduction, or mag¬ 
netic attrai t«on‘ was answered correctly by about 45 per cent 
of the Liberal Arts students, 40 per cent of the engineers, ao 
per cent of the Business Administration students, 15 per cent 
of the Fine Arts students and $0 per cent of those in Home 
Economics, 

In the groups other than engineering, less than 50 per cent 
knew the use made of a carjHmter's level. The traditional story 
of tile bees ami die birds and the flowers would have misfired 
with these students as less than a quarter of them knew about 
pollination, The general characteristics of man as a vertebrate 
likewise were rather obscure to these students, with less than 
half of them able to select from fish, sponge, oyster, lobster 
amt insect, the animat that is most similar in structure to man. 

Even the Applied Science students, who scored as a group 
high on this part of the test, showed that they were not gen¬ 
erally educated in the area of science. A study of the item 
analysis revealed the specificity of their knowledge. In general, 
they did excellently on items concerned with mechanics, heat, 
light, sound and electricity, hut, on items concerned with biol¬ 
ogy and geology, they did no better than the students in the 
other colleges, Several of the items concerned with the wear¬ 
ing of white and woolen clothing showed that the engineers 
transferred their knowledge of heat and light rather poorly to 
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actual life situations, with about 30 per cent of the students 
missing the items 

Four members of the faculty, one each in the areas of bac¬ 
teriology, chemistry, physics and zoology, were asked to look 
over the items on this part of the test and to consider them in 
the same manner as the members of the English Department 
were asked to treat the Literature items. The group as a whole 
thought that this part of the test was a rather good test of 
general education in the area of science. A half dozen of items 
were considered to be too specific to be included in a test of 
general education. 

The item analysis of Part V of the test, Fine Arts, showed 
that on the whole the students were rather poorly informed in 
the various areas of the fine arts, except for students in the 
College of Fine Arts But even in this group, unexpectedly poor 
results showed for many of the items. 

Some of the more interesting results are noted below Prac¬ 
tically all of the students located the Hanging Gardens as 
having been in Babylon, but only from one-half to three- 
quarters of all the students in all colleges knew that Serge 
Koussevitsky was an orchestra conductor. Items concerned 
with contemporary fine arts were poorly answered. For ex¬ 
ample, 40 per cent or less of all students (except Fine Arts 
seniors, 59 per cent) recognized Thomas Benton as a contem¬ 
porary American painter and less than a third of all the stu¬ 
dents identified Jacob Epstein as a modern sculptor. Salvador 
Dali fared a little better with from 50 to 80 per cent of the stu¬ 
dents recognizing an outstanding characteristic of his work. 
In music the situation was about the same. Sixty per cent or 
less of the students knew who wrote the Stalingrad Symphony 
and even fewer recognized the composers of Oklahoma . 

Three members of the faculty of the College of Fine Arts 
went over the ninety items of this part in the same manner 
as faculty members treated the other areas, With a few excep¬ 
tions, most of the items were thought to be concerned with 
things that a “generally educated” person should know in the 
area of Fine Arts. 

The item analysis of Part VI, Mathematics, showed that this 
was the most difficult part of the test The students’ papers 
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were studied to sec just how tar the various groups went throueh 
the test Students in ixtth classes of the College of Applied 
Science attempted nearly all of the items. The median last 
item attempted was fifty-six for both classes of the College of 
Liberal Arts, (There are sixty items on this part of the test) 
For students in the College of Business Administration the 
median last item attempted was fifty-six for the seniors and 
fifty-nine for the sophomores In Fine Arts this median dropped 
to forty-five for the seniors and to fifty-two for the sophomores 
and in the College of I Inttic Economics, the median was fifty- 
one for the seniors and fifty-eight for the sophomores. 

A study of this item analysis showed that most students 
could perform the simpler arithmetical and algebraic opera¬ 
tions. An item concerned with the extra cost involved when 
articles are purchased on the installment plan showed that 
25 per cent of the students had no idea how to figure correctly 
such a common every-day problem. Forty-five per cent of the 
students were able to compute the annual interest on a short¬ 
term loan. A simple question involving buying and selling was 
solved correctly by about 2? per cent of the students outside 
of the College of Applied Science. Sixty-five per cent of the 
Applied Science students solved the problem correctly. A prob¬ 
lem in thinking with symbols how many minutes are therein 
“p" days -was also difficult, for 30 per cent or more of the 
students, other than engineers, could not solve it. The concept 
of a converse and the ability to state one was also missed by 
more than one-half of the students. Similarly the concept of an 
axiom was unknown to about 70 per cent of the students. 

Course programs of a sample of thirty students, members of 
the Class of 1947, were analyzed to determine the number of 
hours the students carried in the various areas of general edu¬ 
cation. On the basis of this analysis and on the study of var¬ 
ious course patterns as stated in the catalogs of the different 
colleges of the University, estimates were made of the number 
of credit hours the students in the different colleges carried in 
various areas of general education. (Estimates of the general 
education courses of Liberal Arts students were made entirely 
from the catalog.) The discussion which follows covers the 
five areas of general education included in the Cooperative 
General Culture ’test. 
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The five colleges were ranked on the basis of the amount of 
course work in each of the several areas of general education 
received by the students in each college. The mean scores of 
the students in the five colleges on the six parts of the test were 
also placed in rank order. A comparison of the rank order of the 
number of courses taken and of the mean scores on the six 
parts of the General Culture Test showed that there was in gen¬ 
eral a rather close similarity between the rank order of the num¬ 
ber of hours taken in an area of general education and the rank 
order of the mean score of the students in the five colleges on 
the part of the General Culture Test related to that area. The 
area which deviated most from this was the social studies. 
Here the Applied Science seniors, who ranked fourth in courses 
taken in this area, tied for first place with the Liberal Arts 
seniors who ranked first in the number of courses taken. In the 
Sophomore Class, the Business Administration students, who 
ranked second in the number of courses taken, tied with the 
Liberal Arts students, rank one in courses taken, for first place. 
On the History and Social Studies part of the General Culture 
Test, both classes of the College of Applied Science, rank four 
in the number of courses taken, ranked second in their mean 
scores on the test. 

In the area of Literature, the Business Administration seniors, 
who were tied for lowest place in the number of hours taken in 
this area, were placed in third position with their mean score on 
the culture test The scores of both classes of the other colleges 
ranked the same as the amount of literature studied with minor 
variations. In the area of Science there was almost a perfect 
relationship between the number of courses taken in science 
and their mean scores on this part of the test. 

On the Fine Arts part of the test, both of the Liberal Arts 
classes, which ranked third in the number of courses taken, 
changed places with Home Economics, rank two. A similar 
switch occurred in Mathematics, in which both classes of the 
Liberal Arts College, rank three in courses taken, changed 
places with Business Administration, rank two in courses taken, 
on the rank of the mean score on this part of the Cooperative 
General Culture Test. 

The Time Magazine Current Affairs Test .—'This test was 
administered as an untimed test and the students were not 
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required to put their names on their papers. The test as it 
*a« mask up, consisted of eight parts: V, S. Affairs'Mao 
International, Foreign News, Canada, Science, The Arts and 
Personalities. However, in scoring the test, four of these parts 
Map, International, Foreign News and Canada were com¬ 
bined into one part which was named “World Affairs,” This 
was done chiefly because of the small number of items in each 
of these four parts of the test. 

Mean-total and part scores for the two classes in each of the 
iter colleges were computed. No significant differences were 
found between the sensors and die sophomores for the Uni¬ 
versity as a whole on the total scores and sub-test scores. (The 
statistic a! techniques used here were the same as used in com¬ 
paring the results of the Cnupcrathe General Culture Test). 
When mean total scores of each college and class were com¬ 
pared with the all-university mean for each class, it was found 
that the seniors in the C ollege of Applied Science were signifi¬ 
cantly alcove the mean (<; per cent !c\ el), the sophomores in the 
College of Kusmess Administration in a similar situation, and 
that both classes of the Colleges of Fine Arts and Home Eco¬ 
nomics were sjgnfii antly below the mean (i per cent level), 
When the mean scores of parts of the test were analyzed, 
it was fount! that both classes of the College of lone Arts and 
Home Ivonoinits were significantly below the mean on most 
parts of tins test. The exceptions were that both classes of Fine 
Arts approximated the mean on the part entitled "The Arts” 
and the sophomores in 1 lomc Fxonoinics were below the mean, 
but not significantly so in Science and The Arts, The Applied 
Science seniors and Business Administration sophomores were 
above the mean on l’ S. Affairs (5 per cent level). In World 
Affairs, the Liberal Arts seniors were significantly above the 
mean (5 per cent level). In Science both classes of the engineer¬ 
ing school were significantly above the mean at the 5 per cent 
level. The sophomores in the same college were significantly 
below the. mean in The Arts (5 per cent level). In Personalities 
both Liberal Arts seniors and the two classes of Business Ad¬ 
ministration were significantly above the mean. 

When results for the two classes in the same college were 
compared, the only difference between seniors and sophomores 
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appeared in the College of Home Economics where the seniors 
did significantly better on U. S Affairs and World Affairs 
(both 1 per cent level) and were higher on their total scores than 
the sophomores (5 per cent level). 

A comparison of the rank order of the mean scores of the 
different colleges on the various parts of this test with the 
number of courses taken in an area showed results similar to 
those obtained when scores on the Cooperative General Culture 
'Test were compared with number of courses taken. The Ap¬ 
plied Science students similarly scored high on the social studies 
parts of this test. The seniors ranked first on U. S. Affairs and 
second on World Affairs and the sophomores second on U. S. 
Affairs and first in World Affairs. In the number of social studies 
courses taken, these engineering students ranked fourth. 

Summary of Findings 

1. On the Cooperative General Culture Test , Syracuse Univer¬ 
sity students ranked high according to national standards. 
Converting the mean scores of Table 1 into percentile scores 
placed the mean-total score of the seniors at the 78th percentile 
and of the sophomores at the 76th percentile on national norms. 
The average total score of students in the five colleges was well 
above the national average in all cases and as high as the top 
11 per cent in the best case These rather high mean percentile 
scores are due in part to the high scores the students made in 
their special areas of study and are not a reflection of a well- 
balanced program of general education. In the areas of the 
test related to the students’ field of specialization, the scores 
averaged from the 75th to the 95th percentiles, but in the areas 
outside of the students’ major fields the scores averaged from 
the 50th to the 70th percentiles. 

2. When total scores on the Cooperative General Culture Test 
were compared, it was found that students in both classes of 
the Colleges of Liberal Arts and Applied Science scored signif¬ 
icantly above the all-university mean. Seniors in the College 
of Business Administration and both classes of Fine Arts and 
Home Economics achieved significantly below this all-univer¬ 
sity mean. 

3. Achievement in the various areas measured by this test is 
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definitely related to the amount and pattern of course work 
taken in those areas by students; and, even in the major field 
students’ knowledge tends to be specific to course rather than 
general. Students in Applied Science scored highest on the 
Science and Mathematics parts of the test; students m Fine 
Arts scored highest cm the Fine Arts part of the test, etc. When 
the high scores on a part of the test are further examined, the 
specificity of tile students’ education is brought into sharper 
focus. For example, the Applied Science students did well on 
the items pertaining to physics and chemistry, but relatively 
poorly on items dealing with the biological and geological 
sciences and on items calling for practical applications of 
scientific principles to daily life. 

4. In areas of study outside of their major fields, students 
scored relatively poorly on the test. For example, Fine Arts 
students scored relatively low on Science, Mathematics, and 
on the parts of the test related to the social sciences. Applied 
Science students scored relatively low on Literature and Fine 
Arts; Business Administration students scored relatively low on 
Literature, Science and Fine Arts; Home Economics students 
scored relatively low on Literature and Mathematics. Lib¬ 
eral Arts students, on the other hand, scored relatively high on 
all parts of the test, Similarly, the engineering students scored 
relatively high in the area of the social studies. 

5. There is apparently no significant increment to general 
education during the last two years of college residence as the 
seniors scored no higher, or not significantly so, than the sopho¬ 
mores on the Cooperative General Culture 'Test, 

6. On the Time Magazine Current Ajjairs Test, students in 
the Colleges of Liberal Arts, Applied Science and Business Ad¬ 
ministration scored relatively high, whereas students in the 
Colleges of Fine Arts and Home Economics scored relatively 
low. The total score of the seniors on this test was not signifi¬ 
cantly higher than that of the sophomores. The typical Syra¬ 
cuse student was able to answer about half of the items on this 
test correctly, 
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TEST: II. SELECTION OF ITEMS FOR 
FINAL SCALES' 
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Vocabulary items are among the most frequently used 
components of mental tests. They are, as a rule, relatively re¬ 
liable and valid, take little time to administer in comparison 
with their usefulness, and can be given and answered in so 
many ways that they can often be used successfully where 
other items fail, as in the case of spastic children and certain 
aphasic adults. For general clinical use, a test should not be 
dependent upon the skills of reading and writing, and should 
avoid the ambiguities inherent in administering and scoring 
items calling for definitions by the testee. On the other hand, 
the test should be, of course, as reliable and valid as possible, 
should be short and easy to administer, and should have con¬ 
siderable intrinsic interest value 

Vocabulary items make up what are probably the best single 
subtests in the 1937 revision 0/ the Stanford-Binet (11) and 
the Wechsler-Bellevue Adult Intelligence Scale (13). Terman and 
Merrill report an average correlation of .81 for separate age 
groups between the Stanford-Binet vocabulary test score and 
the mental age on their scale as a whole, while Wechsler states 
that his vocabulary subtest correlated ,85 (eta) with the total 
scale for the original standardization group, With these high 
validities in mind, a search was made by the senior author for 

1 Acknowledgment is due Professor F. Y. Billingslea, Mrs. Helen S, Ammons and 
Mr, Neil W. Coppinaer of Tulnne University for reading the manuscript critically and 
offering many helpful suggestions The test platM and a manual with final scale norms, 
answer sheets and instructions for administration (1) may be obtained from R. B, 
Ammons, 
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a method of vocabulary testing which would meet the clinical 
criteria already outlined. 

The most promising technique located seemed to be that 
used by Van Alstyne (12), where a child was asked to choose 
from among four pictures on a card the one which illustrated 
a particular language concept, word, or phrase. Since this test 
had been given only a very limited standardization and sev¬ 
eral pictures were out of date, Ammons and Huth (4) set up a 
new set of 16 plates and tried out a considerable number of 
items with a small group of children. An analysis of the results 
from the try-out showed that this type of test could be given 
quickly, was useful at least through the ages of 6 to 17, and 
was highly reliable and valid. On this basis, a series of studies 
(a, 3, 5, f>, 7) was undertaken to construct and standardize a 
test for all levels of verbal ability. The present paper is the first 
in the series reporting this work. 

After a testing technique has been decided upon, at least 
three major problems present themselves to the constructor of a 
vocabulary test: (a) how to obtain items of a suitably wide 
range of difficulty, (b) how to select items of satisfactory 
representativeness of content, and (c) how to choose items valid 
far the estimation of differences in level of intellectual ability. 
It is conceivable that random sampling of all word meanings in 
a fairly large dictionary would provide a partial solution to 
these problems. Variations of this method have been used 
frequently. Seashore and Kckerson (10) selected a word from 
each left-hand page of a large dictionary, omitting prefixes, 
suffixes and abbreviations, and obtained a total of 1320 prelimi¬ 
nary items. Similarly, Atwell and Wells (8) chose 100 words 
“by chance" from a 2a,ooo-word dictionary. The preliminary 
form of the Wechsler-Bcllevue vocabulary test (13) was a list 
of 100 words, one each chosen from the top of every fifth page 
in a school dictionary, omitting “obsolete, technical, or 
esoteric words." 

In practice, random selection of vocabulary items does not 
work out particularly well for a number of reasons. If item 
selection techniques are to be employed in the choice of a final 
scale, randomness is lost. Word meanings should probably be 
used as the original population, rather than words themselves. 
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In a multiple-choice test, the precision of meaning tested is a 
function of the alternative words used with the given item. 
Finally, if one uses the picture vocabulary technique, certain 
words cannot well be represented, and item difficulty is deter¬ 
mined to a considerable degree by the nature of the drawings 
themselves and the alternate drawings. For these reasons, in 
this test no attempt at randomness was made and our initial 
words were merely subjectively selected to be as representative 
as possible, on the basis of the pictures already available. An 
analysis of the results as presented later in this paper seems to 
justify this approach. 

The problem of representativeness of content was thus 
handled subjectively A suitable range of difficulty was obtained 
by a choice of items after testing. Several possible alternatives 
present themselves when one wishes to select items for validity: 
suitability of material can be estimated subjectively, indi¬ 
vidual item correlations with total scale score can be used, and 
correlations of items with outside criteria such as age or mental 
test results can be computed. It will be seen later that a com¬ 
bination of all these with several more specific criteria was 
actually used. 

Problem 

The purpose of this study was to obtain a suitable group of 
vocabulary items and to set up the two final forms of a picture 
vocabulary test, based on the 16 4-picture plates developed by 
Ammons and Huth (4). To accomplish this, it was necessary to 
find a large number of words appropriate to the cards, to try 
these out on a representative population, and to select those 
items meeting the criteria established. 

Procedure 

Materials .—Item selection and testing centered around 16 
4-picture plates (1). With the plates already available, the next 
step was the discovery of a large number and variety of poten¬ 
tially good items to administer to the standardization group. To 
start with, dictionaries were checked and advanced students in 
psychology verbally associated with the plates as stimuli. 
From these sources, 2,43 words pertinent to the pictures were 
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obtained in addition to the 48 selected by Ammons and Huth 
(4), a total of 291, Of these, 43 were eliminated because of 
obvious ambiguity, probable sex differences in experience with, 
or regional meaning, leaving 048 items for pretesting. 

Pretesting consisted of administering these 248 items with 
their associated plates to a small sample of children and adults 
varying widely in age and ability.* Four children were tested at 
each CA level a through 17, and four college students at each 
Wechsler IQ level 99-109, 112-119, 121-129, 131-138, and 
140-144. For children 2 through 5 results were available from 
Form L of the Stanford-Binet; those 6 through 17 were given 
the vocabulary test of Form L; while full Wechsler scale results 
were available for the college students. These estimates of 
verbal ability were later used in the ranking of items by diffi¬ 
culty for settingup the test finally given to the standardization 
group. Two males and two females were tested at all but 3 of 
the ai levels. The college students ranged in ability as already 
noted; the 2- to 5-year-olds had Binet IQ's between 90 and no; 
while the school children 6 to 17 years old were judged by their 
teachers as being average in intelligence. Tests were adminis¬ 
tered in the same way as outlined in the procedure section 
for the standardization group. 

After all 84 subjects had been given the appropriate intel¬ 
ligence test and the picture vocabulary test, the number of 
correct answers was tabulated for each item by age levels, and 
moving averages were calculated using five successive points. 
Per cent passing was estimated, as in Ammons and Huth’s 
study, on the basis of these moving averages between ages. 
Twenty-two more items were eliminated, either because they 
discriminated poorly between successive age levels, or because 
there were too many items passed by 50 per cent of the subjects 
at a given level. 

The resulting 226 words, including 33 remaining from Am¬ 
mons and Huth’s 48, were then listed by plates and by diffi¬ 
culty level, difficulty level being the estimated MA at the 50 
per cent passing point, in terms of the intelligence tests given. 
The items in order of difficulty were: 

•Thanks are due Mr. William L. Miller and Mr, Alvin Yordy, principals In the 
Denver Public Schools, and Mr. Gene Gullette of Englewood High School, lot 
making suhiect* available 
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Plate i: pie, window, dessert, vegetable, human, seed, pane, 
sill, ventilation, agriculture, anti-socialness, transparent, rec¬ 
tangular, translucent, culinary, sector, illumination, intimida¬ 
tion, segment, depredation, physiognomy, egress. 

Plate 2: wagon, dancing, teacher phonograph, partners, ath¬ 
letes, transport, competition, revelry, terpsichorean, ebullience. 
Plate 3: car, fight, boxing, counter, pump, customer, paying, 
clerk, fuel, sale, sport, purchase ; gauge, merchant, competition, 
recreation, petroleum, retaliation, replenishment, pugnacity, 
conveyance, aggressiveness, transaction. 

Plate 4: chimney, park, shrubbery, panels, dwelling, veranda, 
panorama, urban, domicile. 

Plate 5: presents, island, surf, isolation, munificence 
Plate 6: bird, horse, fly, wagon, transportation, insect, con¬ 
veyance, antiquated. 

Plate 7: race, catching, uniform, sport, discussion, skill, pas¬ 
sion, affection, flight, impact, amour, dialogue, discourse. 

Plate 8: house, clothes, firecracker, basket, music, laundry, 
clean, explosion, sudden, garment, neglect, dehydration, deto¬ 
nation. 

Plate 9: farm, manufacturing, skyscraper, landscape, currency, 
industual, pecuniary, tranquillity, agrarian. 

Plate 10: chair, cup, spoon, furniture, razor, thermometer, 
steel, refreshment, liquid, mercury, container, grooming, bever¬ 
age, centigrade, tonsoriai. 

Plate 11. clock, circle, numbers, locket, engraving, lobe, senti¬ 
ment, appendage, chronometer, pendant. 

Plate 12: food, meal, afraid, hot, fear, startling, nutrition, 
perspiration, tattered, vagabond, gorging, poverty, glutton, il¬ 
legality, felony, humid, vagrant, coercion, mastication, desti¬ 
tute, gourmand, itinerant, insatiable, repast, corpulence, sudor¬ 
ific, mendicant. 

Plate 13. telephone, accident, crying, cheerful, collision, de¬ 
struction, vehicles, mishap, portrait, transmitter, sympathy, 
propulsion, communication, consolation, condolence, negli¬ 
gence, bereaved, lacrimation, deleterious. 

Plate 14: policeman, safe, uniform, listening, broadcast, danger, 
protection, authority, disaster, gravitation, catastrophe, con¬ 
stabulary, fortuitous. 

Plate 15: bathtub, bed, chair, newspaper, operation, illness, 
anaesthesia, cleanliness, aseptic, crisis, leisure, immersion, re¬ 
cumbent, somnolent, displacement, perusing, supine. 

Plate 16: airplane, train, propellers, locomotive, intersection, 
harbor, aviation, altitude, marine, fuselage, nautical, roadstead. 

Subjects .—The test was administered to 600 white American- 
born subjects ranging from age two to thirty-four years in¬ 
clusive. Table 1 shows the number of subjects of each sex tested 
at each age level. Numbers are not equal because correct grade 
placement was a primary control, rather than age, with the 


/ 



Jia EDUCATIONAL AND PSYCHO LOGICAL MEASUREMENT 


school children. Thirty were tested at each grade level but n 
l8-year~nlds were discarded because it did not seem possible to 
obtain a reasonably unbiased sample at this age level. 

Between the ages of a and 17 inclusive, the subjects were 
selected by age or grade levels with respect to the fathers’ oc¬ 
cupations in direct proportion to a ten-group socio-economic 
breakdown as presented in the index of gainfully employed 
white males of the 1940 United States census (14). Where 

TABLE 1 

Number and Amaze Chronological A%( oj Subjects Tested at Each Chronological Age 
Ijtcd in the Present SlatidaiJizalson Group 


Malrn Females Total 
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* Years. 

numbers per age level were below one subject, age levels were 
combined. 

For the adult group, males and females were separately con¬ 
sidered in direct proportion to the occupational status of white 
males and females between the ages of 18 and 34 as given in the 
census reports (14). The urban sample was obtained in the 
Denver area from private, parochial, and public schools; busi¬ 
ness establishments; amusement parks; and homes. The rural 
sample was secured from rural districts in Colorado and Ne¬ 
braska. More detailed information about the sampling controls 
is given in articles dealing with the subgroups of the standardi¬ 
zation population (3, 5, 6, 7), 
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'Testing. —Preschool-age children were tested in their homes 
or in special rooms at day-care centers; school children were 
brought to rooms provided by the schools for testing; and adults 
were tested in their own homes, in testing rooms of industrial 
firms, in parks, or in a church office. All subjects were given 
an intelligence test and the standardization picture vocabulary 
test of 226 tentative items. The full Stanford-Binet, Form L, 
was given from ages 2 to 5, the Stanford-Binet vocabulary test 
from ages 6 to 17, and the Wechsler vocabulary to adults. 
Standard administration procedures were followed for each 
test (11, 13). The picture vocabulary was given first to all 
groups but the adults. 

Since testing was done by a number of examiners, a detailed 
procedure including a set of instructions was set up. The subject 
was seated opposite the examiner, with plates and recording 
sheet out of sight. The session was started by asking for personal 
information, such as name, age, and occupation of head of 
family or awn occupation. The subject was told he was to be 
asked some questions that he could answer by pointing to one 
of the four pictures on a plate. It was explained that some items 
would be too hard for him, and that he should not guess, but 
just say "I don’t know.” Doubtful items were checked by asking 
the subject why he made a certain choice, asking him to define 
the item verbally, or repeating the item later. This seemed to 
discourage guessing almost completely, 

Items were scored right or wrong, and testing proceeded on 
a given plate until three successive items had been failed and 
three successive items passed. This was considered sufficient, 
since the items had been arranged in order of difficulty after 
pretesting, and items beyond the three-consecutive-pass and fail 
levels could reasonably be assumed to be passed or failed. In 
order to maintain rapport, the tester was free to introduce 
easier words at any point. Testing was started on successive 
plates at the subject’s mental level as estimated from responses 
to preceding plates. 

Item selection .—After the 589 subjects had been tested with 
the 226-word preliminary scale, an item selection was made, As 
a first step, all correct responses were tabulated by age, sex of 
subjects, and item. Words below the three-consecutive-pass 
level for each individual on each plate were considered as passed 
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and those above the three-consecutive-fail level as failed. In 
order to make all values of passes comparable, per cent passing 
was calculated from number passing and number theoretically 
attempting. 

A CA or adult index number was found corresponding to the 
50 per cent passing point for each item for the whole group, by 
interpolation if necessary. For example: 

CA levels 

Word 8 9 10 n 

Per cent passing 

Shrubbery ai 31 56 69 

The point where 50 per cent would pass lies between CA’s 9 and 
io, actually at 9.8 by interpolation. CA’s were used in calculat¬ 
ing the 50 per cent passing point through age 17, while index 
numbers were assigned to six adult levels set up on the basis of 
Wechslcr vocabulary scores. A word with a rating of A1.5 
would have been passed by less than half of the lowest 20 adults 
(Ai) and more than half of the next to lowest ao adults (Aa), A 
word with a rating of Afi.j would have been passed by less than 
half of the highest 20 adults (A6). Thus, relative difficulties 
were computed for all words in terms of 50 per cent passing 
points and were indicated by CA or adult index number. CA 
17 and A3 were considered to be equal levels, and difficulties can 
be figured from below a. to A6 in one series on this basis. 

Items were rejected for the following reasons: (a) inadequate 
discrimination in per cent passing between successive age levels, 
(b) regional meaning, (c) sex difference in difficulty, (d) am¬ 
biguity of denotation, (e) same item already used with another 
plate, (f) too many words at a given age level. 

(a) Words were thrown out where nearly the same number 
of subjects passed on several successive age levels, or an item 
was harder for a more advanced group. For example: 

CA levels 

Word 8 9 10 n ia 

Per cent passing 

Gauge 24 31 61 50 72 

Words eliminated on this basis were: pane, gauge, veranda, 
affection, neglect, landscape, startling, tattered, vagabond, il- 
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legality, vagrant, destitute, transmitted, disaster, aseptic, lei¬ 
sure, recumbent, marine, physiognomy, conveyance, detona¬ 
tion, grooming, and illness. It will be noted in the following list¬ 
ings that several words were rejected for more than one reason. 

(b) The following words were eliminated because of poten¬ 
tially varying difficulty depending on regional experience differ¬ 
ences: urban, grooming, roadstead, partners, petroleum, aseptic, 
aviation, altitude, marine, fuselage. 

(c) A separate tabulation was made of the number of males 
and females above and below the 50 per cent passing point for 
each word. Where there were marked discrepancies in item diffi¬ 
culty between the sexes, a chi-square test (9) was run. Apparent 
differences between the sexes at beyond the one per cent level 
were noted in the case of the following words: detonation, avi¬ 
ation, altitude, fuselage, partners, tattered, vagabond, illegality, 
vagrant, and destitute. These were eliminated, although it is 
realized that such marked sex differences would of course occur 
a number of times by chance in this large a number of words. 

(d) Several words were rejected because they potentially 
referred to two different drawings on the same plate: customer, 
boxing, competition, catching, flight, glutton, illness, partners, 
and merchant. 

(e) “Sport” and “uniform” were tried out on two different 
cards and the better card-word combination on the basis of the 
other criteria was kept. 

(f) It was decided to have 10 words at each level from below 
1 to 5 years, 8 at each level from 6 through 16, and 8 at each 
adult level 3 through 6, or a total of 170 words in the final scale. 
Where there were too few words, as at levels 2, 4, 5, 8, 11, 12, 
14, A3, and A5, words were borrowed from adjacent levels. 
When a minimum number of words had been assigned to each 
level below —1 to 16 and A3 to A6, the surplus words were 
eliminated in the order that they failed to meet the other 
criteria. It should be noted in this connection that several of the 
criteria were only relative and subjectively applicable to begin 
with, and this final process of eliminating on the basis of an 
oversupply of words at a given level led to further qualititive 
differences between the items used with the various plates. 

The final step was to divide the 170 items into two forms 
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equal in length and as equal in difficulty as possible. The words 
were therefore arranged in order of difficulty without respect to 
plate, and assigned in groups of four, the first and fourth of 
each group going to h’orm A, and the second and third to 
Form B, 


Results 

Following are the 85 words finally chosen for Form A. with 
their difficulty levels indicated r ’ 

Plate 1: pic (1.7), window (1.7). seed (6.5), sill (6.7), trans¬ 
parent (13.3), rectangular (14-7), sector (16.0), illumination 
(10.0), culinary (17.2), egress (A6.3). 

Plate i: athletes (8.6), competition (15.0), revelry (A4.0) 
ebullience (A6.4). ’ 

Plate 3: counter (4.0), pump (4.4), clerk (6.4), sport (7.6), 
recreation (io.8), pugnacity (16.9), replenishment (A3.1), re¬ 
taliation (A4 ,i). 

Plate4:shrubbery (9.8), dwelling (n.7). 

Plate 5: surf (12.5), isolation (12.9). 

Plate 6: horse (1.5), wagon (2.3), insect (6.7), transportation 
(8.6), antiquated (Aj.8). 

Plate 7: discussion (7.7), skill (10,9), amour (13.8). 

Plate 8: firecracker (2.7), clothes (3.0), explosion (4.9), dean 

(5.5) , dehydiation (A4.3). 

Plate 9: farm (4.1), currency (12.2), tranquillity (16.5), agrar¬ 
ian (A6.2). 

Plate ro: furniture (4.4), steel (6.0), refreshment (6.2), liquid 
( 7 * 3 ), container (9,5), centigrade (14.5). 

Plate n: clock (t.6), locket (3.0), numbers (3.4), engraving 
(9.8), 

llate ia: hot (5.2), fear (7.4), nutrition (10.4), gorging (12.8), 
poverty (13.9), mastication (A2.6), itinerant (A4.5), coercion 
(A4.fi), corpulence (A5.5), insatiable (A5.6). 

Plate 13: telephone (2.1), crying (2.9), accident (3.0), vehicles 

(9.5) , destruction (10.0), portrait (10.2), communication (10.6), 
consolation (13.4), negligence (14.3), bereaved (15.4), deleteri¬ 
ous (A6.2). 

Plate 14: dan ger (5.6), 

Plate 15: bed (i.6), newspaper (2.5), anaesthesia (11,7), immer¬ 
sion (14.6), displacement (Aj.o), perusing (A5,o). 

Plate 16: propellers (3.7), harbor (8.1), locomotive (8,2), nau¬ 
tical (16.5). 

The following 85 words were chosen for Form B: 

Plate 1: vegetable (3.8), human (4.4), dessert. (4.5), agriculture 
(10,7), anti-socialness (13.2), segment (15.0), intimidation 
(t6,6), translucent (Aa.j), depredation (A4.0), 

Blate a: phonograph (3.3), transport (8.4), terpsichorean (A6,o), 
Plate 3: car (1.6), fight (2,8), paying (6.0), customer (6.3), 
fuel (7.5), sale (7.9), purchase (10.4), transaction (14.6), ag¬ 
gressiveness (A3.6). 
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Plate 4: panels (13 9), domicile (A4.0). 

Plate 5: island (53), munificence (A5.7). 

Plate 6: bird (1.6), fly (a.jf), conveyance (14.5). 

Plate 7: passion (12 5), impact (13 5), dialogue (13.6), discourse 
(A4.5). 

Plate 8: music (3.0), laundry (4 7), sudden (9.1), garment (9 8). 
Plate 9: manufacturing (7.2.), skyscraper (7.8), industrial 
(10.0), pecuniaiy (A4.9). 

Plate 10: spoon (1.8), razor (3.0), thermometer (4.1), meicury 

(10.7) , beverage (10,9), tonsorial (A4.4), 

Plate 11: circle (2.7), sentiment (13.9), lobe (15.5), chronometer 

(15.7) , pendant (i7-7)- 

Plate ia: meal (3.9), peispiration (9,6), humid (14.7), felony 
(167), gourmand (A4.6), repast (Aj.a), mendicant (A6.3). 

Plate 13' cheerful (6.8), collision (74), sympathy (96), 
mishap (11.1), propulsion (13.3), condolence (16.2), lacrimation 
(A6 3). 

Plate 14: policeman (2.5), listening (5.3), broadcast (5.9), uni¬ 
form (6 2), safe (6.5), protection (6.7), authority (10.4), grav¬ 
itation (11 8), catastrophe (12.0), constabulary (A3.2), fortui¬ 
tous (A6.4). 

Plate 15- bathtub (1.6), operation (3.1), cleanliness (8.7), ciisis 
(12 5), somnolent (16. a), supine (A5 5). 

Plate 16; train (1.5), airplane (1.8), intersection (8.5). 

The point levels given with the words should be considered 
only as indices of difficulty, since actual average ages within 
age groups were not used in their calculation. The average level 
of Form A is 10,7 and that of Form B is 10.5. It can be seen that 
the forms are closely comparable in difficulty for the whole 
group. 

Rough analyses of the incidence of parts of speech and of 
content areas were made for both forms combined. There are 18 
words which are direct derivatives of relatively common verbs, 
125 nouns, and 2.7 adjectives. Designating content areas arbi¬ 
trarily, there are 30 words of home or domestic import, 38 
referring to nature or science, 60 relating to social processes, n 
commercial, 14 personal feelings, and 17 not readily classifiable 
in this scheme. It would seem that the test puts a premium on 
the knowledge of names referring to society and social ac¬ 
tivities. 


Discussion 

To the extent that the occupational groups in the Denver area 
and a small rural area in Nebraska are typical of those in the 
United States as a whole, norms from this test can be considered 
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to be representative. There is, of course, some bias, as in all 
results based on controlled samples, but the sample is controlled 
at least as adequately as Weehsler’s, if not more so. In any case 
it provides an excellent basis for item selection. 

The words finally chosen cover the range of verbal ability 
thoroughly, and discriminate well between ability levels as 
found in different age groups. Later papers show that the two 
test forms made up of these words intercorrefate highly, and 
Correlate well with other intelligence tests. The approximate age 
placement of items is only intended to facilitate the testing 
mechanically, as the test is actually a point scale. Norms for a 
general white population (3, 6, 7) and for certain population 
subgroups (a, 5) will he given for both forms in later papers. 
From a practical point of view the promise of the test is 
well borne out. Proficient testers were able to test three or four 
children an hour with both the picture vocabulary test and the 
1937 Stanford-Binct or the Wechsler vocabulary test, A high 
interest level was in evidence on the part of most of the testees. 
It seems from the above that if has been possible to construct a 
vocabulary test satisfactory for testing persons unable to speak 
or verbalize well. 


Summary 

Ammons and Huth (4) showed that it was possible to con¬ 
struct a picture vocabulary test of high reliability and validity. 
The present paper reports the procedure whereby items for such 
a test based on their 16 plates and covering the age levels from 
2 to 34 were obtained and validated. The general procedure was 
as follows: 

1. A set of 243 new items appropriate to the plates was 
listed and 48 of Ammons and Huth’s final items were retained. 

2. Of these 291 items 43 were eliminated by group discussion. 
A preliminary validation check was made on the remaining 248 
words, and 226 were retained. 

3. These 226 items were used to test 589 white American- 
born subjects ranging in age from 2 to 34 years. The sample was 
controlled by age levels for parents’ occupation or own occupa¬ 
tion, age-grade placement in school, and sex. 

4. On the basis of this standardization testing, 56 i tems 
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were eliminated because of regional bias, failure to discriminate 
between successive age levels, too many items at a level, sex 
differences, ambiguity of picture denotation, or duplication of 
words on different cards. 

5. The remaining 170 items were divided into two equal- 
length foims which were found to be almost identical in diffi¬ 
culty. 
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DOES FACE VALIDITY EXIST? 1 

SIDNEY ADAMS 
U. S. Civil Service Commission 

Fact, validity, in this paper, has the meaning of" appearance 
of validity" in the language of Mosier. Mosier (6, p. 152) 
says: 

In ilisi. visage, the term ‘face validity’ implies that a test 
which is to he used in a practical .situation should.. . appear 
practical, pertinent, and related to the purpose of the test... 
it should nor only he valid, Imr it should also appear valid. 
This... is nor validity in any usual sense ... [but is] an ad¬ 
ditional attribute of the test which is highly desirable in certain 
situations. 

This paper attempts a measurement of face validity by 
having a group of Federal government workers judge the ex¬ 
tent to which seven tests jxissessed true validity, The analysis 
of the results attempts to answer two questions: 

(1) Does face validity exist in a form that can be reliably 
measured? 

(2) What relationship docs face validity bear to true validity? 
Is a test with the appearance of validity likely to be one with 
actual validity? 

Partial answers to both questions are to be found. Dr, Thelma 
Hunt had done unpublished research on the guessed validity of 
general psychology examinations and its relationship to true 
validity. Smith (8) had students evaluate seven types of ex¬ 
aminations (essay, true-false, etc., as to their suitability for 
determining the grade in an education course. On the average, 
each student’s rank of the validity of the test types correlated 
+.31 with any other student’s estimate of validity. This in¬ 
dicates that under the conditions of the experiment, face valid¬ 
ity of test-type, with the same subject-matter for all test 
types, is measurable, but not very reliably measurable. 

1 A discussion of the relationship between Face Validity and True Validity for the 
members of a group who tried out an experimental test battery. 

320 
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The literature concurs in finding face validity a poor indica¬ 
tor of true validity. This has been pointed out by Mandell (4) 
and Mosier (6) O’Rourke (7) had judgments made on pro¬ 
posed tests for the postal service. He demonstrated that one 
test with great face validity possessed little or no true validity 
in a tryout and statistical validation which followed the judg¬ 
ments, 

The subjects for this study were 39 membeis of the Personnel 
Department of the United States Veterans Administration. 
Their salary grades ranged from CAF 5 to CAF 12 ($2634.80 
to $5905.20). It is probable that all members of the group 
possessed considerable knowledge of test methods. 

The individuals participating in the study took eleven tests 
during two half-day sessions. To reduce interference with work, 
one session for each group of approximately 20 persons was 
held on one day, followed by a second session on the next day. 
During the second session of each of the two groups, testing 
was suspended. Each individual in the group was asked to 
rank the first seven of the tests, on the basis of their desir¬ 
ability for use in the selection and promotion of people for 
personnel jobs of the kind and level held by members of the 
group. Each examinee was told to write, on a sheet of paper, 
a code number which was used as his designation. These num¬ 
bers provided anonymous identification throughout the study. 
The examinees were then told to write, in the time-order in 
which they had been taken, the names of the first seven tests 
of the series. These were, in order: 

Administrative Judgment Test 
Interpretation of Data ( Graphs ) Test 
Vocabulary Test 
English Expression Test 
Contemporary History Test 
Personality Estimates Test 
Word Identification Test 

The Administrative Judgment Test presents, for each ques¬ 
tion, a situation or problem in business or government organ¬ 
ization or procedure. Five solutions are offered for each prob¬ 
lem The examinee is asked to choose the best of the five. The 
Interpretation of Data Test requires the examinee to read and 
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interpret graphs and tables which show economic and social 
trends. The content of the third and fourth tests, Vocabulary 
and English Expression, is more or less self-explanatory. The 
Contemporary History Erst is a factual examination on national 
and world events between 1915 and 1948. It had more “back¬ 
ground" questions and fewer straight news questions than do 
mast current events tests. The introduction to the Personality 
Estimates Erst describes the personality traits of five individ¬ 
uals. Each question then describes a certain action or states a 
certain opinion. The examinee was asked to indicate which of 
the five imaginary individuals would most probably have taken 
the action or held the opinion. The /Vend Identification 'Test 
was a type of vocabulary test in which the examinee was re¬ 
quired to identify a particular word needed to complete a 
sentence. The initial letter of the word, and the number of 
letters in the word, were given. 

The examiner described each test briefly, in order to recall 
all tests to the examinees. 11 At the time of the rating of the tests, 
a show of hands indicated that all examinees had reached the 
sixth test, Personality Estimates. Those who had not reached 
the lFord Identification Eest were asked to look at the sample 
questions for this test. The various tests were not separately 
timed, lienee, at the time of the rating of the tests, the ex¬ 
aminees had reached different tests in the battery. 

The examinees were asked to consider which one of the 
seven tests was the best for selection for, or promotion to, 
personnel jobs in the Veterans Administration, or similar per¬ 
sonnel jobs. The tests were to he ranked according to their 
present state; no allowance was to be made for possible improve¬ 
ments in the tests. The best test in each list was marked “1 
best”, the next best as “a”, and so on to “7” for the poorest 
test. Tied ratings were to be reconsidered; the examinee was 
to break the tie arbitrarily if tests appeared tied after recon¬ 
sideration. 9 Examinees were cautioned that “face validity” 

* The first seven, rather then el! eleven, tests were used. This was done to allow the 
tests to be rated at a convenient time in the schcduje. Also, there would be probably 
considerable confusion among the judges in comparing, by recall, ns many as eleven 

•The examiner assigned a "4” or average rating to one omitted test. Two tests 
remained tied, presumably after reconsideration by the rater, The examiner broxt 
the tie by tossing a coin. 
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was not the major consideration in the selection of a good test; 
that a test might sometimes be a poor selective or promotional 
aid in spite of apparent validity. 

The distribution of the ranks assigned each test is shown in 

TABLE 1 

Frequency Distribution, Mean and Variability oj the Rank in Face Validity of Seven 
Tests by Veterans Administration Personnel Workers 

Rank M <r 

1 2 3 4 5 6 7 

19 7 6 4. II I 2 18 I.J2 

a 14 13 2 03 3 3.28 1 71 

1 1 4 8 12 9 4 4.85 1 37 

4 1 7 14 4 7 a 4,08 1.58 

3 3 4 3 7 9 to 4.92 r.92 

10 10 a o 4 1 12 3,74 2.51 

o 3 3 8 n 7 7 4.95 1 45 


Administrative Judgment 
Interpretation of Data 
Vocabulary.. 

English Expression .. 
Contemporary History 
Personality Estimates, 
Word Identification 


TABLE 2 

Horst's Reliability for Rated Face Validity of Tests 
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39 

192 
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39 
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6.32 
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109a 

5460 

28.00 

140.00 

118 38 

21.62 

5689 





A 


B 


c 


Table i. The mean and standard deviation of the rank assigned 
each test is also shown in this table. 

In terms of Horst’s (2) formula for reliability, the reliability 
of face validity ratings amounts to .911. This is shown in 
Table a, which is arranged according to Horst’s work sheet for 
his formula. Thus, it appears that the measure of face validity 
used does have, reliability. 
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(n = no. of ratings per test) (X = raw ratings) 
(A r = no. of tests) 
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r = i 
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118 . 38-784 ' 9 1 
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TABLE 3 

Variant/ Analysis 0/ Ratings of Tests 
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4, the mean rating 
(rank) of all tests 

0 w 

Souare of 

Column (3) 

Administrative Judgment . .. . 
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Contemporary Affairs 
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TABLE 4 

Rank Correlations htteeen Rated Estimates of Test Validity for Random Pairs of Examinu 

—Raters 
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This was confirmed by a variance analysis following the 
method of Mills ( 5 ) which showed the between-variance greater 
than the within-variance at a probability within the one per 
cent level, z was equal to 1 . 28 , These calculations are shown 
in Table 3 . It thus appears that face validity is a definite 
entity, whether or not face validity is related to true validity. 
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Some tests do appear to this group of examinees to be more 
valid than others. 

A different approach to this problem is by determining 
•whether face validity shows any reliability. This is demonstrated 
by showing the rank correlations between raters. Of the 39 
raters, 38 were paired in random pairs. The final digits of 
logarithms in corresponding positions on successive pages in a 
logarithmic table were used to determine the pairing of the 
individuals, each of whom had a code number. Rank correla¬ 
tions (p) were determined for each pair. These correlations 
are shown in Table 4. For computation of the standard error 
of p, see (1, p. 12.3. 


Administrative Judgment 
Interpretation of Data... 
Vocabulary ... 
English Expression. ... 
Contemporary History. . 
Personality Estimates. . 
Word Identification 


TABLE 5 
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S' 

85 

722; 

128 

16784 

189 

3 J 7 *I 

159 

25281 

192 

146 

36864 

21316 

193 

37*49 


180040 


The mean p amounts to +.178, and the median to +.2.50, 
Thus the relationship between the ranks of the tests by any two 
individuals, while very small, is positive. A further measure of 
the interrelationship of the pairs can be obtained by the use of 
the average intercorrelation formula. The use of this formula 
gives an answer to this question: Do the rank correlations com¬ 
puted in Table 4 appear representative of all the possible 
correlations which could be computed between the ranks of 
tests for different pairing combinations of subjects? A total 
of 741 correlations would be possible. The mean of these corre¬ 
lations has been computed by the average intercorrelation 
formula, as used by Smith (8), and explained by Kelley (3). 
The formula used was— 

r, = 1 - + 2) , 12SS 2 

(a - i)( 2 V - 1) " r a(a - i)^ 2 - 1) 
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In this study, a is 39, the number of subjects. N is 7, the num¬ 
ber of ranks, which is equal to the number of tests. S is the 
sum for each test, of the square of each rank, times the num¬ 
ber of times the test was assigned to that rank. The computa¬ 
tion of 2.'*® is shown in Table 5. The value of r n is +.33. This 
agrees fairly well with the observed values of the 19 rank cor¬ 
relations. Thus, by the use of both variance analysis and cor¬ 
relation, the measurable existence of face validity has been 
shown. 

What is the relationship of face validity to actual validity? 
The true validity of the tests was measured by correlating the 


TABLE 6 

Relationship of True Validity lo Fact Validity 
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test scores with the average rating of each participant. The 
ratings of each participant were made by other participants, 
who claimed knowledge of his work. Another measure of true 
validity used as a criterion was the civil service grade of the 
participants. 

In Table 6 the face validity and both kinds of true validity 
are shown for each of the seven tests. The rank of the test 
in each of these characteristics is shown. The rank correlation 
of the face validity is +.31 with true validity determined with 
a judgment criterion. It is +.50 with true validity computed 
against a salary-grade criterion, , 

In Table 7 it is seen that the relationship of the true validity 
of a test to its validity estimated by one individual is very 
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small and undependable. The numbers used in Table 7 are 
not the code numbers used in the examination. 

Conclusions 

1. Face validity appears to exist, at least for the tests, sub¬ 
jects and conditions described in this paper, Examinees, ex¬ 
posed to several tests, agreed with measurcable consistency 
that some of the tests appeared more valid than others. 

a, Wide differences often exist between the judgments made 
by different individuals as to which tests possess face validity, 
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ADMINISTRATION OF THE PURDUE PEGBOARD 
TEST TO BLIND INDIVIDUALS 


JAMES W. CURTIS 

Illinois Division of Vocational Rehabilitation 

Aptitude testing of the blind involves certain difficulties 
not always encountered in connection with normal individuals 
or individuals with other types of physical handicaps. Perhaps 
the two principal difficulties are the limitations in potential 
vocational placements, and the limitations in available testing 
instruments. Although the past decade has witnessed increas¬ 
ing attention to the development of suitable instruments, a 
substantial number of aptitude factors still present relatively 
difficult problems of determination, as applied to blind persons. 

In successful rehabilitation and job placement, the necessity 
for careful evaluation increases in direct proportion to the 
severity of the handicap. Improvisation often becomes a neces¬ 
sary part of the repertoire of the psychological tester, particu¬ 
larly in those instances in which blindness is the handicap. 

It was noted by the author, on numerous occasions, that job 
placement of blind individuals by the Illinois Division of Voca¬ 
tional Rehabilitation involved an element of finger-hand dex¬ 
terity not satisfactorily measured by commonly used adapta¬ 
tions of standard manipulative and dexterity tests such as the 
Pennsylvania Bi-Manual Work Sample and the Minnesota Rate 
of Manipulation ’Test. After some trial-and-error investigation, 
it was determined that the Purdue Pegboard Test could be used, 
with very little special adjustment, in a quite satisfactory 
manner with blind individuals, It was found, moreover, that 
the results so obtained provided a significant addition to the 
results obtained from other manipulation and dexterity tests, 
in standard use with the blind, such as the two mentioned 
above. 

The utility of any "standard test,” in conditions of special¬ 
ized use, is in inverse proportion to the complexity of the special 

3*9 
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adjustments necessary for such use. At the same time the 
fewer the necessary adjustments, the greater will be the ad¬ 
herence to the original standardized conditions and, conse¬ 
quently, the more significant will be the results from the stand¬ 
point of the original test purpose. Fortunately, the Purdue 

TABLE t 

Purdue Ptghaard Test 
Norms for the Blind 
In-70) 


Petninillc 

Inwitloo 

Assembly 

59 

40 

38 

95 

.19 

36 

90 

.18 

34 

to 

.14 

35 

70 

3» 

3° 

60 

^9 

38 

50 

16 

26 

40 

55 

55 

30 

5,1 

53 

a 0 

31 

31 

10 

n 

18 

5 

»4 

H 

1 

4 

3 


Pegboard Pest may be administered to the blind with only the 
following deviations from standard instructions: 

a. As the examiner introduces the test, he assists the subject 
in manually examining the board, locating the cups, exam¬ 
ining the pins, sleeves and washers, and identifying the rows 
of holes. 

b. At the start of each sequence, the tester places one pin (or 
one assembly) in the first hole of the row of holes to be used. 

In the two-hand sequence the operator places a pin in the 
first hole of both rows. The pin 01 pins so placed do not count 
in scoring but serve as orientation points for the blind sub¬ 
ject. No additional deviation from the original instructions 
is necessary. It is desirable, however, to have the subject re¬ 
examine the sleeves and washers before proceeding with 
the assembly section. 

Up to the present time, 70 blind subjects have been tested 
by the Purdue Pegboard Pest in conformity with the instruc¬ 
tions outlined in the above paragraph. The age range was 18 
to 44 years, with the distribution of ages approximating a bell 
curve. There were 45 male subjects and 25 females. The IQ 
range was 89 to 130, with the average, 107. Each of the 70 
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subjects had voluntarily contacted the Division of Vocational 
Rehabilitation for rehabilitation The 70 were tested in turn, 
according to their date of application for services. Other than 
this, no selective factors were in operation. Norms obtained 
from these 70 cases are presented in Table 1, in terms of per¬ 
centiles. 

The scores included in Table 1, under the designation “in¬ 
sertion,” represent the total number of pins inserted by right 
hand, by left hand, and by both hands, for one trial. The scores 
designated as “assembly” represent the total number of pieces 
assembled in one trial, on the section of the test designated 
“assembly” by the'publisher 

A study of the insertion scores in Table 1, using the 1948 
Purdue Pegboard Profile Sheet for comparison, will show that 
the 99th percentile (blind norms) is equivalent to the 0.9th 
percentile, for industrial applicants. The 50th percentile (blind 
norms) is below the first percentile level, for industrial appli¬ 
cants. A comparison of the assembly section norms of Table 
1 with the 1948 Profile Sheet, shows that the 99th percentile 
(blind norms) is equivalent to the 80th percentile, and that the 
50th percentile (blind norms) is equivalent to the 15th per¬ 
centile 

Although an insufficient period of time has elapsed to peimit 
a statistically reliable validation of the norms for the blind, 
on the basis of achievement in training or employment involv¬ 
ing finger-hand dexterity, preliminary results have indicated 
the strong advisability of utilizing such data as a part of the 
vocational testing complex. 

Summary 

The Purdue Pegboard 'Pest was administered to 70 blind in¬ 
dividuals, subject only to minor modifications in administra¬ 
tive technique. Tentative norms, based on these administra¬ 
tions, were determined in terms of percentiles. Incomplete 
results suggest a significant level of utility for measurements 
obtained by this technique, in vocational guidance and place¬ 
ment of blind individuals. 



EVALUATING PSYCHOMETRIC PROFICIENCY 

FRANK M. m MAS 
American I’mmcil on Education 

Introduction 

lMmvim!Ai,s who have (he responsibility of training applied 
psychologists are often faced with the problem of evaluating 
the ability of their proteges to administer individual tests, 
There are two considerations involved. First, the evaluation of 
the student as compared to other students. Second, the evalua¬ 
tion of the student as compared to a professional standard of 
competency. Because of the guild-type training received, the 
evaluation may he highly subjective. It would seem, therefore, 
that an objective method of evaluating psychometric profi¬ 
ciency would serve ns a useful supplement to the generalized 
subjective evaluation of the supervising clinician. 

Time is important to the busy clinician. The rationalization 
of die two procedures that follow was made with this con¬ 
stantly in mind. The problem may be stated thus: Can an 
objective procedure be worked out which the supervising cli¬ 
nician can apply routinely in appraising psychometric profi¬ 
ciency? Of the two tests that follow, the first can be made in a 
minute or so and the second should seldom require more than 
three or four minutes, 

Analysts oj the Standard Error of Measurement 

The square of the standard error of measurement, cr t «> 2 > 
may be regarded as the variable 1 error variance of a test score. 
This variance has two components: the variance due to the 
psychometrician, V> and the variance not due to the psycho¬ 
metrician, oV, as 

ffjW 2 w ffp* + {Tap 4 . (1) 

1 Errors may be classified m cither variable or systematic. The present author would 
like to point out that this paper does not evaluate systematic error, The present 
method, therefore, is applicable only when an evaluation of variable error js desired. 
The method suggested in this paper is meaningless when only systematic error is 
present, 
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The variance (r p 2 may be regarded as composed of two com¬ 
ponents also - the variance due to the psychometrician himself, 
o-ph 2 , and the variance due to the situation in which the test 
is administered, <r a 2 . But since the trained psychometrician is 
responsible for giving the test under specified conditions, we 
may write 

o-p 2 = o-ph 2 + o, 2 . (a) 

The variance <Tn P 2 may be legarded as having two com¬ 
ponents: the variance due to the testee, cn 2 , and the variance 
due to the test instrument, in 2 . That is, 

Vnp 2 = fft 2 + ci 2 . (3) 

It is obvious, however, that <n 2 is usually infinitesimal when 
the same test is used. When a parallel form is used interchange¬ 
ably <n 2 may increase but usually only slightly. Since m 2 —> o 
we may disregard this quantity and write 

ffnp 2 = vt 2 . (4) 

It follows that 

VI00 2 = ffp 2 + fft 2 . (s) 

It is obvious that an unskilled psychometrician should be 
less reliable than a skilled psychometrician, i.e., the square of 
the standard error of measurement derived from test scores 
obtained by an unskilled psychometrician, o-„ u 2 , should be 
larger than the square of the standard error of measurement 
derived from test scores obtained by a skilled psychometrician, 
<r ty h , as 

o-xxu 2 > Vxxj 2 . ( 6 ) 

Since (6) would be true even if both the skilled and unskilled 
psychometrician used the same testees, and the same test in 
the same situation, it follows that (6) is due to the fact that 

the variance due to an unskilled psychometrician, should 

be larger than the variance due to a skilled psychometrician, 

Vsp 2 . 

We should expect, therefore, <7x*u 2 > <rx« ! because 

Cup 2 > Cep 2 . (7) 

Now, the psychometricians who standardized a particular 
test may be regarded as expert or skilled psychometricians. 
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Therefore, we may substitute the square of the standard error 
of measurement as published in the standardization data, 
for the variance in relation (6) as 

<r»xn 5 5* o’***- (g\ 

'File quantity cr**, which is the published standard error of 
measurement for a particular test, will be used in the proce¬ 
dures that follow as the standard error of measurement desired 
from a skilled psychomctrieian. The square of the quantity, 
a** 3 , will be regarded as the variance of a population of meas¬ 
ures obtained by a skilled psychometrician on a single indi¬ 
vidual. It follows that the degrees of freedom for <r XI 4 will be 
*0 , 


Criterion oj Psychometric Proficiency I 

Procedure, 

a) Regard the first test score, St, obtained from a testee by a 
psychomctrieian as the mean of a population of such meas¬ 
ures. 

b) Regard the second test score, S a , obtained by the psycho¬ 
metrician from the testee as :t deviation from the mean. 

c) Regard o\x as the standard deviation of a normally dis¬ 
tributed population of such measures. 

d) Criterion of psychometric proficiency, I, is attained when 
the second test score does not deviate significantly from 
the first test score, i.c., when the null hypothesis is accept¬ 
able. 

c) Test the null hypothesis by applying the following formula 


0 


x 


Si - Sa 

<r*x 


( 9 ) 


where x =■ deviation from the mean in terms of sigma as 
the unit. 

Enter the normal probability table with x and obtain the 
probability area, A, lying between this deviate value and 
the mean. Multiply this area by a and subtract this product 
from 1. If the decimal place in the remainder be moved two 
places to the right we then have the level of confidence at 
which the null hypothesis may be rejected. These operations 
may be summarized as follows; 

L, C. « 100(1 — aA). (10) 


Evaluation: The assumption given in (a) above is implicit in all 
test scores obtained in the clinic. It is the rule rather than the 
exception that only one test score of a kind is obtained from 
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the testee and this score is considered as an estimate of the 
mean of a population of such measures 
The level of confidence at which the null hypothesis may be 
rejected is set by the supervising clinician. The severity of the 
criterion may be increased as the training progresses by merely 
setting the level of confidence for rejecting the null hypothesis 
at a lower point—say, from die 10 per cent L. C. to the 40 
per cent L. C, 

Application: Let us assume that a group of psychometricians 
are to be evaluated. Table 1 demonstrates the actual compu¬ 
tation necessary. Explanation of Table 1 follows: 

Col. 1: Names of evaluated psychometricians 
Col, 1 : The two test scores obtained by each psychometrician, 
(Si, S 2 ). Let these be Wechsler-Bellevue IQ’s 


TABLE 1 

Evaluation by Criteiion I 


Col 1 

Col 2 

Col, 3 

Col 4 

Col. S 

Col. 6 

Smith 

1 31 

12 6 

U 

86 

6 

5.674 

I.06 

19 % 

Jones 

2 

5.674 

■35 

73 % 

Brown 

92 

108 

l 6 

1.674 

2.82 

1% 


Col 3. The difference between the two test scores obtained by 
each psychometrician. 

Col. 4: The published standard error of measurement for the 
particular test being used; in this example the Wechsler- 
Bellevue test of intelligence (i), This is cr xx . 

Col. 5: x as defined in formula (9). 

Col. 6; Approximate level of confidence at which the null hy¬ 
pothesis can be rejected. 

The psychometricians may be compared as follows (see Col. 
6): Jones is best, Smith is next best, and Brown is the poorest 
in psychometric proficiency in regard to the Wechsler-Bellevue 
test of intelligence. 

If the criterion of proficiency were set at the 20 per cent 
L. C,, then Jones and Smith passed the criterion and Brown 
failed the criterion. 
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Criterion of Psychometric Proficiency II 
Procedure-- 

a) Regard two or more scores obtained by a psychometrician 
from a single tcstee as a random sample from a normal 
population of such measures. 

b) Regard a X xv as an estimate of the variance of this popu¬ 
lation. (** Sd 9 /n - I, where 2d 2 is the sum of the 
squared deviations from the mean of the sample in (a) 
and *V* is the number of scores in the sample. 

c, Regard <r*, s as an estimate of the variance of a normal 
population of a set of measures obtained from the testee 
by a skilled psychometrician. 

d. Criterion of Psychometric proficiency, II is attained when 
tfxxp 5 is not significantly greater than ov* 3 . 


TABLE a 

Eca/nalion by Criterion II 


Col, 1 

Col. 1 

Col i 

0 ) 1 . 4 

Col. 5 

Col. 6 

Joe 

llO 

n8 

d& 

41.00 
d.f. => a 

1-34 

>J% 

Tom 

u 

It 

df^ 

16.67 
d.f. - 3 

1.11 

> 5 % 

Bill 

93 

m 

a £?* 

m.co 
d.f. - 1 

3-48 

3 % 


e. Test the null hypothesis by first applying the formula 


P B3 


Ox * 1 


) 


(n) 


where the d.f, 4 of r** 1 may be taken at °o and the d.f. of 
of o'**p 3 is n *” 1, 


Evaluation: The supervising clinician should first inspect <r,V 
and if (rxxp* < Txx', the psychometrician being evaluated is 
less variable than the skilled psychometrician—at least on the 
basis of this estimate of his variance—and the F test need not 
be made. However, if the supervising clinician wishes to know 
whether or not the psychometrician being evaluated is signifi- 

> The degrees of freedom of m 1 Is exactly the she of the standardization sample 
minus one. Since the standardization sample is usually several hundred, the error 
introduced by setting the d.f. always at « is very, very small. The utility is that 
only 1 line of die F table need be used. 
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cantly less variable than the skilled standardization psycho¬ 
metrician he may make the F test 

F = (I2-) 

where the degrees of freedom are the same as in (9). 

From Formula (6) it follows that we should expect c7 xxp 5 > 
The application of formula (11), applied only when tr xxp 2 
> <r X3; 2 , will indicate whether or not the intraining psycho¬ 
metrician is significantly more variable, and therefore less re¬ 
liable, than the skilled standardization psychometrician. 
Application: Let us assume that a group of psychometricians 
are being evaluated Table 1 represents the actual computation 
necessary Explanation of Table 1 follows: 

Col. 1. Names of evaluated psychometricians. 

Col. 2: Wechsler-Bellevue IQ’s obtained by each psychome¬ 
trician. 

Col. 3: The square of the standard error of measurement as 
published for the Wechsler-Bellevue test of intelligence, 
i.e., (5.674) 2 . This is 
d.f. = degrees of freedom. 

Col. 4: Estimated variance for a population of such measures 
as sampled in Col. 2. This is <r XX p 2 . 
d.f. = degrees of freedom. 

Col. 5: Fisher’s F ratio. 

Col. 6: Level of confidence for rejecting the null hypothesis. 

The psychometricians may be compared as follows: Tom is 
best, Joe is next best and Bill is the poorest psychometrician 
in regard to the Wechsler-Bellevue test of intelligence. From 
26.67 < 32.. 19, we know that Tom is less variable and, there¬ 
fore, probably more reliable than even the standardization psy¬ 
chometrician. However, Tom is not significantly more reliable. 

If the criterion of proficiency had been set at the 5% L. C., 
then Tom and Joe passed the criterion and Bill failed the cri¬ 
terion. 
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INTEREST AND PERSONALITY MEASURES OF 
VETERAN AND NON-VETERAN UNIVERSITY 
FRESHMAN MEN 

RATHER INF. K. FASSETT 
University of Wisconsin 

Firry veterans and fifty-six non-veterans, all freshman men 
coming to the University of Wisconsin Student Counseling 
Center in 1946-48, have been investigated with respect to their 
interest scores on the Strong Vocational Interest Blank and their 
personality scores on the Minnesota Multiphasic Personality 
Inventory. Both the Strong and the Multiphasic are routinely 
administered to all students coming to the Counseling Center; 
Multiphasic scores arc K-correctcd (3), and the Strong scored 
on thirty-four occupations in eleven groups. The ages of the non¬ 
veterans ranged from 17 to 19 years with the median at 18; of 
the veterans, from 20 to 30, with the median at 21. The length 
of service of the veterans ranged from 24 to 72 months, the 
median being at 33 and a half months. All had some service 
outside of the continental United States. The academic classifi¬ 
cations of Letters and Science, Engineering, and Agriculture 
are represented in both groups. 

Interests, as measured by patterning on the Strong (i), show 
no significant differences between the two groups of men. Judged 
by the total number of A and B+ scores, the veterans have 
more fully crystallized interests than do the non-veterans. This 
difference is significant beyond the one per cent level of con¬ 
fidence, the veterans giving more of the high scores than do the 
non-veterans. Such increase in crystallization of interest with 
added age has been found in previous investigations (4). In the 
case of the groups compared in the present study, there is no 
overlap in age; the difference here found might consequently be 
expected in terms of age alone. However, the studies on which 
such a difference has been demonstrated have not had the 
factor of war experience affecting the older group, and it has 
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sometimes been thought that the service experience of veterans 
may have hindered the maturing of their vocational interests 
which would otherwise have come about with added age. The 
comparison of high scores in the present study indicates that 
such maturing has gone on in the veteran group, although to 
what extent, as compared with men of similar age in non-war 
years, cannot be judged from this evidence. 

Multiphasic measures show no significant differences between 
the two groups in central tendencies on any scale; both groups 
have mean profiles which run close to the 50 t-score mean of the 
general population. The mean t-score for no scale, for either 
group of men, was higher than 59, or lower than 49. Greater 
variability for the veterans was shown on several of the scales 


TABLE 1 

Standard Dentations of Multiphasic T-Scores* 
Comparison between Veterans and Non-Veterans 



Veterans 

N -50 


Non-veterans 

N =* 56 


Scale 

SD. 


SD 

C.R. 

Hs 

9 22 


6-9 3 

a.01 

Pt 

is.22 


10.88 

2.3S 

Mf 

12 4r 


8.71 

2.47 

D 

15,1a 


10.66 

2.42 


* Scales which arc not listed show differences significant at > .05 level of con¬ 
fidence. 


(Table 1); and the veterans appear somewhat more often than 
do the non-veterans in the score ranges indicative of possible 
personality deviations. Counting the total number of scores on 
all scales for each group, and computing the percentage of such 
total which falls at or above 75 t-score, the veterans show a 
greater percentage of high scores than do the non-veterans. On 
the Mf scale, a larger percentage of the veterans than of the 
non-veterans score at and above a t-score of 70. Both of these 
differences are significant beyond the one per cent confidence 
level. As the Mf scale is usually interpreted, the fact of the 
veterans scoring higher on the scale would indicate the presence 
of more feminine tendencies on their part than on the part of 
the non-veterans. The assumption is often held that young men 
of college age show some aggression to over-protection by their 
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mothers, and take on some feminine attributes in order to 
compete with the mothers. The means of both groups of men in 
this study run somewhat higher than the mean of the general 
population, but the fact that the veterans’ mean is significantly 
above the non-veterans’, can probably not be accounted for by 
mother relationships, since, on this basis alone, the non-veterans 
would be expected to run higher, inasmuch as they have re¬ 
cently been closer to their homes. The presence of more feminine 
tendencies on the part of the veterans might be due to the fact 
that, having been separated from considerable feminine contact 
for some time, they react to such contacts when entering the 
college situation in a coeducational institution—possibly show¬ 
ing an aggression towards, or competition with, the female 
student body which had been predominant on the campus be¬ 
fore the return of the veterans. The fact that these young men 
have chosen to undertake an education rather than to get some 
gainful employment immediately might be the result of one or 
bath of two tendencies commonly considered to be feminine, 
an interest in cultural pursuits, and a dependence, in this case 
perhaps a desire to be sheltered by society as represented by the 
Government and the University. Such a desire for dependence 
could very conceivably be the outgrowth of the youths having 
been pushed into the mature role of becoming aggressors for the 
sake of society, at an age and stage of development where many 
of them were not ready for such a role. 

The Si scale (a) indicates that both groups are generally like 
average college students in tendencies toward social participa¬ 
tion, despite the fact that, as freshmen, they are new to the 
University, and are, further, students who have demonstrated a 
felt need for specialized help from the Counseling Center, 
Students come to this Counseling Center on a purely voluntary 
basis. 

These conclusions cannot be applied to student groups as a 
whole without reservation, since this study was limited to 
freshman men; and even these may not be typical of freshman 
men as a whole, since li ttle is known at present as to what" type” 
of student seeks out the services of the Counseling Center. It 
does not seem unlikely, however, that the subjects of the present 
study are more or less representative of today’s student body; 
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and inasmuch as psychometric tests are, at the present stage of 
student personnel work, used more frequently on students who 
come for help to some specialized person or agency than on the 
entire college population, it is hoped that the present findings 
may be of some use to those who are attempting to aid students 
in their adjustment during the post-war period, 
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AWARD IN STUDENT PERSONNEL RESEARCH 

C. GILBERT WRENN 
University of Minnesota 

Nominations for the Award in Student Personnel Research 
may now he submitted to the undersigned members of the Com¬ 
mittee on Awards of the Council of Guidance and Personnel 
Associations, At a meeting in Toronto, Canada, July 9th and 
10th, 1949, the Board of Representatives of the Council ap¬ 
pointed the Committee on Awards to report at the spring meet¬ 
ing in 1951. The award to be given is not a monetary considera¬ 
tion, but is to be in the form of a statement of recognition by 
the Board of Representatives of the Council of Guidance and 
Personnel Associations. It is planned to make announcement 
of the project or projects selected on Council Day each year 
and to give publicity concerning the selections through pro¬ 
fessional journals. It is hoped that such recognition will not only 
serve to call national attention to significant research already 
completed, but will stimulate further basic research in the field 
of student personnel. 

Although the Council of Guidance and Personnel Associa¬ 
tions is concerned with personnel work and personnel research 
in industry, business, government, and education, the projects 
to be considered for the first award are those which were com¬ 
pleted within the area of personnel work with students in ele¬ 
mentary school, high school, college, and university. 

The committee has decided to limit its consideration of re¬ 
search for the first award or awards to studies which were pub¬ 
lished in some form during the period July 1, 1946, through 
June 30, 1949, It is recognized that there is much valuable re¬ 
search unpublished as yet or that may never be published, but 
the inclusion of all unpublished studies would place an un¬ 
manageable burden upon the committee. Future committees 
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may be more inclusive at this point and at the same time cover 
a more restricted range of time. 

It may be necessary to grant two awards, one for research 
conducted by an individual, another for research conducted by 
an institution or agency. An Honorable Mention List will also 
be prepared. 

Nominations of studies may be made by any member of the 
constituent organizations of CGPA, whether the author of the 
study or not, to any member of the Awards Committee. The 
committee will depend rather heavily upon such nominations 
although it may in addition review the literature and supple¬ 
ment the nominations made from the field. Nominations may 
be made through July 31, 1950. 

The research may have been completed by an individual, a 
group of individuals, or an agency. The individual or individuals 
concerned need not be members of any constituent organiza¬ 
tion of CGPA. The nominations should clearly state the funda¬ 
mental contribution that the research study has made to student 
personnel work at any level, together with a statement of the limita¬ 
tions inherent in the research. The nominator should state as fully 
as possible why he thinks the particular study should be given the 
award. Wherever possible the nominator should send two or more 
copies of the research study for examination by the committee. 

It is essential to define what is meant by both "research” and 
"student personnel work.” The committee has adopted the defi¬ 
nition of research given in Carter V. Good’s Dictionary of 
Education: “Research is the careful unbiased investigation of a 
problem, based insofar as possible upon demonstrable facts and 
involving refined distinctions, interpretation, and usually some 
generalization.” The research to be considered may fall in either 
of two general classifications: studies involving directly any of 
the personnel services listed below; secondly, educational, psy¬ 
chological, or sociological studies of a more basic nature that 
contribute fundamentally to a change or development in any of 
the listed personnel services. 

The definition of student personnel work is condensed directly 
from a statement of the Study Commission of the Council of 
Guidance and Personnel Associations at the Chicago meeting 
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in 1949. The services ordinarily to he interpreted as student 
personnel services at various levels of education are the follow¬ 
ing: 

1, The interpretation rtf the school to the individual. 

X. The maintenance of personnel records and the development 
of their use, 

3. The provision of competent counseling to assist the in¬ 
dividual in achieving Itis best educational, vocational, and 
personal adjustment, 

a. This service will have access to psychological testing and 
such other special diagnostic services. 

b. This service will give vocational information and will be 
closely correlated with the placement program. 

c. This service will supplement the counseling efforts of 
classroom teachers. 

4. Physical and mental health services. 

5. Remedial services in such areas as speech, hearing, reading 
and study habits. 

6. Supervision and integration of housing and food services. 

7. A program of activities designed to induct the individual 
into his new life and environment as a member of the school 
community. 

8. The encouragement and supervision of group activities 
significant to the individual. 

(j, A program of recreational activities designed to promote 
lifetime interests and skills appropriate to the individual. 
10. The treatment of discipline as a learning experience, 
n. Financial or similar aid. 

14 . Opportunities for securing help through part-time and sum¬ 
mer employment. 

13. Assistance to the individual in finding appropriate em¬ 
ployment when leaving school and later in achieving oc¬ 
cupational adjustment and advancement. 

14. Enrichment of the life of the individual by providing learn¬ 
ing and experiences in the area of spiritual and ethical 
values. 

15. Provision of opportunities for making socially desirable 
adjustments in relation to the opposite sex. 

16. The continuing evaluation of student personnel services 
in order to make them more effective in the life of the in¬ 
dividual. 

The members of the Committee on Awards are: 

Dr. Mitchell Dreese 

George Washington University, Washington, D. C. 

Dean Clifford Houston 

University of Colorado, Boulder, Colorado 
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Dr. Warren K. Layton 
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QUICK ESTIMATION OF MULTIPLE R 

WILLIAM LEROY JENKINS 
J^eKigh University 

By the short-cut method described below, the multiple R for 
a test battery can be estimated in a few minutes with a degree 
of accuracy sufficient for many practical purposes, Even if a 
Doolittle solution is finally obtained, the method provides a 
preliminary estimate and a useful cross-check against serious 
blunders in computation. 

Although intended only as a rough-and-ready approximation, 
the short-cut has shown so far an astonishing agreement with 
Doolittle multiple R's. In no case has the difference exceeded 
,oa and in a set of ao five-variable problems the mean dis¬ 
crepancy was only .005. 

Method with Example 

1. Arrange the matrix in descending order of validities. Con¬ 
vert r's to E's using Table 1. 

r-DMln'i E-molrix 



VaJ. B 

c 



Val. 

B 

c 

D 

A 

,60 .JO 

,40 

.30 

A 

20.0 

13-4 

8.4 

4.6 

B 

■So 

.20 

.20 

B 

13-4 


2.0 

2.0 

C 

.40 


.20 

C 

8,1 



2.0 

D 

•30 



D 

4,6 





a. Compute the product of the validity E of the first test 
(Primary) and the intercorrelation E between the first two 
tests. Find this product on the ordinate scale of Figure I 1 and 
move across interpolating between the diagonal lines for the 
validity E of the second test (Secondary). From this intersec¬ 
tion move vertically to the scale of Added Eh Add the Primary 
to the Added E to obtain the multiple E for the first two tests. 

Prime ry fKltr. Prtdutl Stuiufury AdM R Multiple i 

ao,o 13,4 268 13-4 3-7 23.7 (AB) 

'The chart in Figure I is too small for convenient use. The author will be gUd to 
furnish without chnrge a photoprint reproduction of the original 8J X n ™trt on 
cross-section paper. 
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TABLE i 


Conversion ojr to E 


r 

E 

r 

E 

r 

£ 

F* 

E 

r 

E 

. 10 

0.5 

• 3 ° 

4.6 

5 ° 

13 4 

.70 

28.6 

.90 

56.4 

.11 

0 6 

■31 

4-9 

• 5 i 

13 a 

■ 7 i 

29.6 

•91 

58.5 

. 12 

0.7 

32 

5.3 

.52 

34.6 

■ 72 

30 6 

92 

60 8 

■13 

0 9 

.33 

S.6 

■S3 

15.2 

■73 

3 i -7 

•93 

63.2 

h 

1 0 

•34 

6 0 

• 54 

IS 8 

■74 

32.7 

.94 

65-9 

.1 5 

1.1 

•35 

6 3 

• 55 

16.s 

•75 

33.8 

•95 

68.8 

.16 

13 

36 

6.7 

•56 

17.2 

.76 

35.0 

.96 

72.0 

■ 17 

1.5 

•37 

7 -i 

57 

17.8 

•77 

36.2 

•97 

75 7 

.18 

1.6 

• 3 8 

7-5 

■ 58 

18.5 

.78 

37-4 

.98 

80.1 

■ 19 

1 8 

■39 

7-9 

■59 

19-3 

•79 

38.7 

•99 

85 9 

,20 

2 0 

.40 

8.4 

,60 

20 O 

.80 

40.0 



21 

2.2 

• 4 i 

8.8 

,6l 

20.8 

.81 

41.4 



22 

2 5 

42 

9-3 

.62 

21 5 

.82 

42.8 



.23 

2,7 

•43 

9-7 

63 

22.3 

■83 

44.2 



24 

2.9 

44 

10.2 

.64 

23 2 

84 

45 7 



•21 

3 2 

•45 

10.7 

.65 

24.0 

•85 

47'3 



.26 

3.4 

.46 

II .2 

,66 

24 9 

86 

49.0 



27 

3-7 

•47 

n.7 

.67 

25.8 

■87 

50.7 



.28 

4 0 

.48 

12.3 

.63 

26.7 

88 

52 5 



•29 

4.3 

•49 

12.8 

.69 

27 6 

.89 

54-4 




ADDED E 



Fiq. I, 

Chnrt for Added E 

The dotted lines in the upper left show the method of finding Added E for step a 
of the problem in the text. 


3. Compute the product of the multiple E for the first two 
tests (Primary) and the larger of the intercorrelations of the 
third test with the first and second. Using this product and the 
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validity of the third test as Secondary, find the Added E and 
the new multiple E. 

Prim-aty l fUtf. Pf&dttil Stc**4&*y A dd*i R Multiple E 

* 3-7 8.4 W 8.4 1.7 2 5 . 4 (ABC) 

4. Continue in a similar manner, always using the largest of 
the intercorrelations of the new test with those already form¬ 
ing the multiple. 

Ptim&ty Frwhtfl Snomdary AR Multiple R 

25.4 4.6 117 4.6 0.6 26.0 (ABCD) 

5. Convert the final multiple E to multiple R by reference 
to Table 1. 

Multiple E 26.0 

Multiple R .67 (Doolittle .673) 

It will be observed that the process is one of building up 
the multiple by treating the successive steps as individual 
three-variable problems, which was the basis of a method 1 pre¬ 
viously published. In the present short-cut, however, the work 
is considerably reduced, apparently without any serious loss of 
accuracy. 

♦Jenkins, W. L. "A Quick Method for Multiple R and Partial r's," (Educatiohal 
a hu ParaiQUKUCAU Mrasuhemiint), VI (1946), 273-186. 

ERRATUM 

In the at tide by William Leroy Jenkins which appeared in the Spring, iojo, issue 
of this journal the figure at the bottom of page 143 should be .79 instead of .89. 
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A STUDY OF GENERAL EDUCATION AT SYRACUSE 
UNIVERSITY WITH SPECIAL ATTENTION 
TO THE OBJECTIVES 

N. M. DOWNIE 
State College of Washington 

C. R. PACE and M. E. TROYER 
Syracuse University 

During the academic year, 1947-1948, Syracuse University 
carried out an all-university self-survey. Among the concerns 
of this survey was a study of the status of the program of gen¬ 
eral education of the University. 

The questions that this study proposed to answer were: 

1. What objectives of general education do the students be¬ 
lieve to be important? 

1, How much of these objectives do the students think that 
they are achieving in their education at Syracuse? 

3. How do the members of the faculty rate the importance 
of these same objectives and what responsibility does each 
staff member assume toward helping students achieve 
these objectives? 

4. What is the achievement in general education of Syracuse 
students as measured by a standardized test of general 
education? 

5 How well-informed are these same students on current 
events? 

6, What are the opinions of the students on some contro¬ 
versial or widely-discussed issues of the day? 

This paper will be concerned with the first three of the above 
questions, The findings of the last three will be reported in 
later issues of this journal. 

Members of the Senior Class, 1948, and of the Sophomore 
Class, 1950, from the following five colleges of the University 
participated in the study: Applied Science, Business Adminis¬ 
tration, Fine Arts, Home Economics and Liberal Arts. An 
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entire day late in the fall term of 1947 was given over to the 
testing program involving students. Each student took Form 
X, the 1947 edition, of the Cooperative General Culture test 
the Time Magazine Current Affairs 7 " est, reacted to an opinion 
scale on current issues and to a list of objectives of general 
education. These same objectives of general education were 
included in the General Questionnaire completed by each staff 
member as a part of the survey. 

The (Jbjeetnrs 0/ General Education.~-K list of objectives of 
general education developed by a committee of the American 
Council on Education and reported in A Design for General 
Education for Members of the Armed Forces 1 was modified and 
used as the basis for this part of the study. This list consists of 
eighteen items as shown in Table 1. 

The student was asked to consider each item in two ways. 
First, "How important do you consider this knowledge, skill, 
or understanding as a goal of your educat ion?” Each item was 
to be marked “very important,” “important,” “of some im¬ 
portance,” “of hardly any importance,” or “of no importance.” 
Second, the student was asked to answer the cpicstion, “How 
much are you getting of this knowledge, skill or understanding 
from college so far?” In this case the answer categories were 
“much,” "some,” or “little or nothing.” 

Each staff member was asked to consider each objective in 
two ways. The first question was, “How important do you 
consider this objective as a goal of general education for all 
students?” This corresponded to the first ratings of the stu¬ 
dents. The second question was, “ What responsibility does your 
area of instruction assume for helping students make progress 
toward the attainment of this objective?” Each item was to 
be marked “direct responsibility,” “incidental responsibility,” 
or "outside my area of responsibility.” These objectives were 
rated by 689 faculty members in the various colleges. 

The responses of both faculty members and students in 
rating the importance of these objectives were converted into 
percentages and the results are shown in Table 1. 

The responses of the seniors and the sophomores were first 

1 Re parti oj Committees amt Cofferences, Series I, Voi. VIII, No. 18. Washington 
D. C,; The American Council on Education, June, 1944. 
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compared. For the majority of the items there were no signifi¬ 
cant differences between the responses of the two classes when 


table i 


Ratings of the Importance of the Objectives of General Education by Faculty 
and Students ( Percentages ) 


Item 

1 Developing good health habits 

2 Understanding the basis of personal and 

community health 

3 Writing clearly and effectively 

4 Speaking easily and well 

5 Developing social competence and social 

graces 

6 Understanding other people 

7 Preparing for a satisfactory family and 

marital adjustment 

8 Discovering personal strengths and weak¬ 

nesses, abilities and limitations 

9 Understanding world issues and pressing 

social, political and economic problems 

10 How to participate effectively as a citizen 

n Understanding scientific developments and 

processes and their application in society 
J2 How to think clearly, meet a problem and 
follow it to a right conclusion without 
guidance 

13 Developing an understanding and enjoy¬ 

ment of literature 

14 Developing an understanding and enjoy¬ 

ment of art and music 

15 Understanding the meaning and values in 

life 

16 Developing a personal philosophy and ap¬ 

plying it in daily life 

17 Making a wise vocational choice 

iff Preparing for a vocation 


F—Faculty 
S—Students 
*—Less than 1 % 

I—Very important 
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3— Of some importance 

4— Of hardly any importance 

5 — Of no importance 
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Chi square was used as a test of significant differences. How¬ 
ever, in rating the importance of item 5, “Developing social 
competence and social graces/' the responses of the two classes 
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were significantly different at the 5 per cent level, with more 
seniors rating the item "very important.” Item 10, “How to 
participate effectively as a citizen," was also found to be sig¬ 
nificantly different at the 5 per cent level, again with more 
seniors rating the objective as "very important.” Items 13 and 
14, "Developing an understanding and enjoyment of litera¬ 
ture” and "Developing an understanding and enjoyment of art 
and music,” were also found to be significantly different at the 
5 per cent level. For both of these objectives, more seniors 
than sophomores rated the items as being "very important,” 
Considering students’ estimates of the amount of each ob¬ 
jective they believe they are achieving, item 4, "Speaking 
easily and well,” was significantly different at the 1 per cent 
level, with the seniors indicating that they were receiving 
more of this objective than the sophomores. Item 7, "Preparing 
for a satisfactory family and marital adjustment” was similar 
to item 4 in all respects. Item 9, "Understanding world issues 
and pressing social, political and economic problems,” was sig¬ 
nificantly different at the 5 per cent level with the sophomores 
recording that they were receiving more of this item than the 
seniors. Item ia, "How to think clearly,” and item 14, “De¬ 
veloping an understanding and enjoyment of art and music," 
were also significantly different at the five per cent level, with 
the seniors receiving more in both cases. 

Thus, in the few instances where there were differences be¬ 
tween seniors and sophomores, the seniors tended to regard 
the objective as more important and to feel that they had made 
more progress toward its attainment than the sophomores. 

The ratings of the faculty and of the students were tested 
for significant differences, again using Chi square. Twelve of 
the items were rated differently by the two groups. These are 
enumerated below, Item 1, “Developinggood health habits,” 
was considered to be more important by the students, 5 per 
cent level. "Writing dearly anti effectively,” item 3, was rated 
more important by the faculty, 5 per cent level. Items 4, 5, 6 
and 7, "Speaking easily and well,” “Developing social com¬ 
petence and social graces,” "Understanding other people” and 
"Preparing for a satisfactory family and marital adjustment,” 
were considered more important by the students, all at the 1 
per cent level. Items 9, 10 and 11, “Understanding world 
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issues and pressing social, political and economic problems/' 
“How to participate effectively as a citizen” and “Understand¬ 
ing scientific developments and processes and their application 
in society” were all rated as being more important by the staff, 
all at the 1 per cent level. Items 16, 17 and 18, “Developing a 
personal philosophy and applying it to daily life,” “Making a 
wise vocational choice” and “Preparing for a vocation” were 
all considered more important by the students, all at the 1 per 
cent level. 

A comparison of how much of these objectives the students 
believed they were achieving with the responsibility assumed 
by the faculty members for the achievement of the objectives 
is shown in Table a. A study of this table shows that for 
most of the objectives, the “Much” column for the students 
and the “Direct Responsibility” column of the faculty contain 
the smallest percentages. The exceptions to this over-all pat¬ 
tern are found in item 6, “Understanding other people,” where 
56 per cent of the students marked that they were receiving 
“Much” and 33 per cent of the faculty considered this ob¬ 
jective to be their “Direct responsibility;” in item 12, “How 
to think clearly,” where 69 per cent of the staff considered this 
objective to be their direct responsibility and only 37 per cent 
of the students believed they were receiving much toward the 
attainment of this objective; and in item 18, “Preparing for a 
vocation,” where 55 per cent of the faculty considered this ob¬ 
jective to be their direct responsibility and 48 per cent of the 
students stated they received much of it. 

If we can assume that, in general, there should be some de¬ 
gree of correspondence between the number of faculty mem¬ 
bers assuming responsibility for an objective and the number 
of students who feel they are making progress toward its 
attainment, then we can compare these two ratings. When 
Chi square was computed for each item, it was found that for 
all but two items there existed significant differences at the 1 
per cent level of confidence. The two for which no differences 
were found were item 4, “Speaking easily and well,” and item 
8, “Discovering personal strengths and weaknesses, abilities 
and limitations.” A further analysis of these significant differ¬ 
ences showed that, except in items 11,12 and 17, the students 
were attaining more of these objectives than the faculty wa§ 
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assuming responsibility for. This situation might then be an 
indication of the results of participation in extra-curricular 


table-: a 


Retting! tf the Amount uj the Outturn t>f Genera! Education Received fo the St, 
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I,’-—Faculty 
S---Students 

1 For faculty, read “Direct Responsibility" 

For students, read "Much" 

1 For faculty, read "Incidental Responsibility" 

For students, read “Some" 

3 For faculty, read “Outside Mv Area of Responsibility" 
For students, read "Little or Nothing" 


activities and residence in fraternities and university houses as 
leading to the realization of some of these objectives of general 
education. 
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It should also be borne in mind that, when a student was de¬ 
ciding whether or not he was receiving various amounts of 
these objectives, he was thinking of no specific courses, but 
was considering his entire program covering the four or two 
years that he had been at Syracuse. Each faculty member was 
considering only his limited area of responsibility. Hence, the 
differences are actually much greater than indicated by the 
data because of this difference in the scope of the educational 
program being rated by the two groups. 

The results were next considered in respect to the five dif¬ 
ferent colleges of the University. The ratings of the faculty 
and students of the various colleges were tested for significant 
differences using Chi square. 

Comparing the results obtained on this check-list of general 
education objectives within each of the five colleges studied 
throws some light on the location of responsibility for their 
attainment and the relative estimates of students’ progress 
toward their accomplishment. For example, the two objectives 
concerning personal and community health and the one con¬ 
cerning family and marital adjustment were acknowledged as 
direct responsibilities by a much higher per cent of the Home 
Economics faculty members than by faculty members in other 
colleges. Correspondingly, larger numbers of home economics 
students than students in other colleges felt they were making 
progress toward these objectives. In contrast, there were no 
appreciable inter-college differences on the objectives concerned 
with effective speech and writing. Preparation for effective 
citizenship and understanding current issues were acknowl¬ 
edged most frequently by faculty and students in Liberal Arts 
and Business Administration. Except in the College of Fine 
Arts, almost no faculty members were taking any direct re¬ 
sponsibility for helping students understand and enjoy art and 
music; and almost no students, except in Fine Arts, felt they 
were achieving much of this objective. These comparisons be¬ 
tween colleges are cited as illustrative. Insofar as they are 
objectives of general education for all students there should 
probably be a reconsideration of responsibility for their pro¬ 
motion and students in all colleges should feel that they are 
progressing toward them. 
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Findings 

As ft result of having eighteen objectives of general education 
rated by students and faculty members, the following con¬ 
clusions can be drawn: 

l. The students of Syracuse University consider the attain¬ 
ment rif these objectives of general education as important 
goals of their education. 

а. In the curriculum, as it is now organized, the majority 
of the students feel that they are making "some” but 
not "much” progress toward the achievement of these 
goals. 

3. The Faculty of Syracuse University considers these same 
objectives to be important goals of education. However, 
there is a difference between the importance placed on 
these various objectives by the faculty and students, The 
ratings of twelve of these eighteen objectives by the Fac¬ 
ulty were significantly different from the students’ ratings 
and eight were rated as being more important by the 
students. 

4. For the achievement of most of these objectives on the 
part of the students, the majority of the faculty assume 
no direct responsibility. 

5. To the seniors and the sophomores most of these ob¬ 
jectives were of equal importance. Four of them, 5, 10, 
13 and 14, were rated as being significantly more im¬ 
portant by the seniors. 

б, For the majority of the objectives, the seniors and sopho¬ 
mores felt that they were receiving about the same 
amounts, except for items 4, 7, 12 and 14, which the 
seniors reported to be receiving more of and item 9 which 
the sophomores received more of. 

7. In the five colleges the ratings of the importance of the 
objectives, the amounts that the students were receiving 
and the responsibility the faculty members assumed for 
the achievement of each varied considerably from college 
to college. The importance placed on an objective and 
the amount the students received more or less depended 
on the curriculum of the individual college. 



EDUCATIONAL GROWTH AS SHOWN BY RETESTS ON 
THE GRADUATE RECORD EXAMINATION 


JOSEPH C. HESTON 
DePauw University 

The Problem 

Educators would like to achieve some objective measure of 
educational growth to demonstrate the progress of students 
through a university curriculum. One such method of evalua¬ 
ting this growth is offered through the use of The Tests of Gen¬ 
eral Education of the Graduate Record Examination. DePauw 
University is now in a position to make an analysis of the test- 
retest records of students who took the Examination in 1946 
(as sophomores) and repeated the same Examination in 1948 
(as seniors). DePauw is one of the universities where sufficient 
students have been tested and then retested to make such an 
analysis possible. Even here, however, the present analysis 
must be restricted to women students, inasmuch as there were 
not sufficient men sophomores tested in 1946 to make an analy¬ 
sis of men’s records worthwhile. Therefore, the present analysis 
deals with the sophomore versus senior records of 157 DePauw 
women students. 


Results 

The most obvious question in this connection would be, how 
much gain do these students show on the eight Tests of General 
Education prepared by the Graduate Record Office? In Table I 
will be found the mean score of these students as sophomores on 
each of the eight tests and again as seniors on each test. It is 
obvious from the column headed “Mean Gain” that in most 
cases there was an appreciable gain. Only one area, Physical 
Science, showed fundamentally no gain at all. This specific re¬ 
sult is not entirely unexpected, since very few of these women 
students took additional physical science courses during their 
final two years. The greatest gain was exhibited in the area of 

a$7 
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Social Studies, followed rather closely by Effectiveness of Ex¬ 
pression and the General Education Index, derived from the 
battery as a whole. 

Gains thus exhibited may be taken at their face value, but 
the question still remains as to their significance. Statistically 
one approaches this problem by inquiring as to what degree of 
certainty we know the gain may not have been due to mere 
chance factors. The solution to this problem is through the use 
of the critical ratio technique. The critical ratio, found by divid¬ 
ing die difference by the standard error of the difference, may 
be interpreted as follows: A critical ratio of zero means there are 
50 chances in 100 that the gain was due merely to chance. A 
critical ratio of 1.00 means there are 84 chances in 100 that the 


TABIJ. 1 

Critical Ratios o] G.R.E. 'Test Cains for ly; lkl'auw Women Tested as Sophomores 
UW’) and Retested as Seniors (eypS) 
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true difference is greater than zero; 2.00 means there are 98 
chances in 100; while 3.00 can be taken a practical certainty 
(100 chances in 100). 

We find only one critical ratio indicating a difference that is 
of no consequence, the one for Physical Science, where the mean 
gain could have been very much a matter of chance. Of the re¬ 
maining eight critical ratios only two, those for Biological Sci¬ 
ence and for Vocabulary, are below the 3.00 level, but these two 
are sufficiently well above the a,00 level as to mean about 99.5 
chances per 100 of being true differences. The critical ratio is 
not necessarily a measure of the size of the difference, but does 
indicate if a difference is statistically significant and is not due 
to chance factors. We may conclude, therefore, that the gains 
shown on all the tests except Physical Science were sufficiently 
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appreciable to be well beyond the limits of mere chance and, 
therefore, represent statistically significant progress from the 
sophomore year to the senior year. Whether or not this progress 
is as great as a faculty might wish is obviously still a matter of 
question. 

A second question one would raise in connection with this 
test-retest program is to what extent did the various tests agree 
with each other when repeated after two years? This analysis 
is not to be confused with the concept of statistical reliability. 

TABLE 2 


Retest Correlations Between Sophomore (1946) and Senior ( 1948) Administration of 
G R.E. Tests to 157 DePauw Women 


Teat 

Correlation 
Soph, vs Senior 

Mathematics... 

... .689 

Physical Science. 

.719 

Biological Science... 

.616 

Social Studies .. 

■739 

Literature. 

.639 

Arts... 

•712 

Effectiveness of Expression 

.705 

Vocabulary. ... . 

.85! 

General Education Index. 

.897 


TABLE 3 

Conelatton Between General Education Index ( GRE) and Scholastic Grade Averages 
( PI 1 R ) 0] 157 DePauw Women 

Variables Correlated Correlation 

Soph. GRE vs. Soph PHR .603 

Soph. GRE vs. Senior PHR .637 

Senior GRE vs. Soph, PHR .549 

Senior GRE vs. Senior PHR 604 


Statistical reliability in the process of test construction is de¬ 
termined by test-retest correlation where the examinations are 
administered relatively close together, so that there is little 
chance of actual change occurring. However, in this instance 
the two-year lapse between the tests permitted considerable op¬ 
portunity for educational gain, not necessarily uniform from 
student to student on each of the tests. This was due to the 
situation whereby various students took different curricula and, 
therefore, made more gains in some of the sub-tests than in 
some of the others. 
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In Table 2 we have presented the retest correlations between 
the sophomore and senior administration of the tests to these 
same 157 DePauw -women. Seven of the sub-tests exhibit retest 
correlations of ,75 or less over the two-year interval. This is not 
surprising because of the varying degrees of educational growth 
in each area for each student. Vocabulary did achieve a retest 
correlation of .85, which indicates considerable consistency from 
one administration to the next. Vocabulary is not a subject- 
matter area, but rather an index of general intelligence, and 
would be expected to show higher retest correlation than the 
specific subject-matter areas. The General Education Index, 
exhibiting a correlation of .897, shows that the battery as a 
whole is remarkably consistent, even when administered with 
two-year interval of time between test and retest. This high re¬ 
test correlation for the battery as a whole may be interpreted 
as meaning students earning a score in the top brackets in the 
sophomore year would almost certainly earn scores in the top 
bracket as seniors. In other words, the matter of gain is a rela¬ 
tive factor and the degree of gain is marked by considerable con¬ 
sistency throughout the battery as a whole. 

A third problem in which one would obviously be interested 
is the relationship between the GRE General Educational Index 
and scholastic grade averages at DePauw for these students. 
In Table 3 we have presented the correlation coefficients be¬ 
tween GRE Indexes and grade averages (PHR) for the four 
possible combinations. Three of these figures are .60 or higher, 
indicating a strong degree of relationship between GRE scores 
and university grades. It is interesting to note that the highest 
correlation is exhibited between GRE index for sophomores and 
their final senior-grade averages. In this sample at least, it 
seems it would have been sufficient to give the GRE to sopho¬ 
mores and then to predict final senior-grade averages without re¬ 
course to administering the tests again to the seniors. For grade 
prediction this process would have been sufficient, but would not 
have revealed the growth as exhibited in Table 1. 
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OF THE GRADUATE STUDENT 
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University of Michigan 

Introduction 

This is a study of the assessment of the potentialities of the 
graduate student and of the criterion of success in graduate 
school. The study was undertaken at the request of the Execu¬ 
tive Board of the Horace IT. Rackham School of Graduate 
Studies of the University of Michigan) and all data were col¬ 
lected within that institution. The study is one of a series of in¬ 
vestigations conducted for the administration of the University 
of Michigan. 

Grades as a Criterion oj Success in Graduate School 

Studies of the prediction of academic achievement are legion, 
but few of them are particularly concerned with the character¬ 
istics of the criterion of academic success. Since it is commonly 
believed that the failure of tests to make accurate academic pre¬ 
dictions is a result of the instability of students’ average grades 
from one semester to the next, it seemed wise to collect some 
evidence on that point at the beginning of the present study. 
This was done by finding the correlation between the grades of 
students in two successive semesters in their field of specializa¬ 
tion. From these correlations it is possible to estimate the num¬ 
ber of semesters of graduate work that would have to be taken 
in order for the grade-point average to be a stable criterion of 
graduate success 1 . Table I summarizes these data. 

The estimated correlation between grades for successive years 
is based on the Spearman-Brown formula. It is fairly obvious 
from the se data that the average grade for two semesters’ work 

1 A stable grade-point average is arbitrarily defined as one that would correlate 0.9 
with another grade-point average computed from an equal period of graduate studies. 
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is a more stable criterion in some areas than in others. It i s 
theoretically quite impossible to predict with any accuracy 
from test scores or other data the grades which a graduate stu¬ 
dent of engineering will obtain during a year of graduate studies 
since grades in that field are highly unstable for a given indi¬ 
vidual. The stability of average grades in other areas follows 
that found in previous studies, with the highest in the physical 
sciences and the lowest in education. 

Background of the Present Study 

The accurate assessment of the student’s potentialities for 
profiting from work at the graduate level is important for two 
reasons. First, there is a need forimprovingselection procedures. 


TABLE i 
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Stability a / Grades 
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Sou ill Studios .... 
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• 79 

5 

Physical Sciences . 

. St. 

68 

.81 

4 

Engineering . 

•• 77 

,58 

•1.7 

23 

Languages and Lit . 

88 

•6J 

•79 

5 

Education. 

68 

• 5.1 

-6y 
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Second, once the student has been admitted it is important to 
be able to determine how far he should continue graduate work 
so that he can plan an appropriate program. Every graduate 
school is familiar with the student who carries a doctoral pro¬ 
gram to an advanced stage before it is realized that an alterna¬ 
tive program would have been a wiser choice. In order to pre¬ 
vent the occurrence of such cases it is necessary to establish a 
system for appraising the potentialities of the student at an 
early stage in his career. 

There have been two main approaches to the assessment of 
the graduate student by means of tests. One is that of appraising 
his “background.” In this approach it is assumed that it is most 
important for the student to enter graduate school with a cer¬ 
tain body of information from a variety of subject-matter fields. 
It makes the additional assumption that a liberal education at 
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the college level supplies the student with a relatively fixed 
body of information which can be measured by tests such as 
the Graduate Record Examination, The philosophy of education 
implied by this approach is more in keeping with the goals of 
higher education of the last century than with those of the 
present decade. 

The other approach to the assessment of the graduate student 
is that of determining the extent to which he exhibits the psy¬ 
chological processes and intellectual skills which are important 
for graduate work This approach was given limited recognition 
in the Graduate Record Examination in the verbal factor test 
and is implicit in current proposals for the revision of that ex¬ 
amination. It is also illustrated by the use by graduate schools 
of high-level tests of general ability such as the C.A.V.D. scale, 
and to a lesser extent by the Miller Analogies 'Test. 

There are several major reasons at the present time for avoid¬ 
ing the appraisal of the prospective graduate student in terms 
of his knowledge. First, it is impossible to identify a common 
body of knowledge which all graduate students should possess, 
and this will become a progressively more difficult task with the 
growing emphasis on intellectual skills and arts as major out¬ 
comes of a liberal education at the college level. This is true not 
only insofar as "general background” is concerned but also in 
the student’s field of specialization. Second, it would be most 
undesirable for graduate schools to indicate that a given body 
of knowledge was a requirement for graduate work. This would 
have the evil effect of the graduate schools controlling the un¬ 
dergraduate curriculum in the same way as the colleges have 
often had the unfortunate role of controlling the curriculum of 
the secondary school. Third, even if a body of essential knowl¬ 
edge could be identified, there would be no up-to-date examina¬ 
tions for measuring the extent to which this knowledge had 
been acquired. 

For these reasons, the assessment of the graduate student 
must be largely in terms of the extent to which he has mastery 
of the intellectual arts and skills necessary for success in gradu¬ 
ate work. Following the pattern of the American Council on 
Education Psychological Examination the present investigators 
devised a test for a higher level of ability which would yield a 
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linguistic and quantitative score and which would measure proc¬ 
esses hypothesised to Ire important in graduate success. The 
test will be referred to as the Academic Aptitude Test, Graduate 
Level. 


The Nature of Test 

All items in the test were multiple-choice with five alterna¬ 
tives. The major portion of the test was liberally timed so as to 
eliminate the factor of speed. The items were grouped into five 
parts which are described below: 

Part /, Vocabulary .—Eighty words were selected from the 
technical terminology of the common areas of specialization of 
graduate work including Physical Sciences, Biological Sciences, 
Social Studies, Languages, Law, and Philosophy. 

Part II, Reading Cans prehension. -This test is called a read¬ 
ing test only from custom. It requires the student to reason 
rather than to memorize what has been read. Questions of the 
following type which follow some of the passages indicate the 
kind of mental process which the test involves: 

Which one of the following individuals is most likely to have 
written the above passage? 

What is the most important practical oversight in the plan 
suggested? 

What is the main purpose of the author? 

What does the author mean by 'regions larger than any em¬ 
pire of antiquity?’ 

Part III, Verbal Reasoning .—This test involves processes 
such as the identification of erroneous assumptions, inconsist¬ 
encies, justifiable in contrast to unjustifiable conclusions, and 
the making of inferences which are probably but not necessarily 
correct. 

Part IV, Quantitative Reasoning ,—This test involves reason¬ 
ing with numbers, but does not involve mathematics much be¬ 
yond that taught in junior high school. A few, but not many, of 
the problems place a considerable emphasis on the ability to 
understand descriptions of complex data. 

Part V, Numerical Ingenuity .—In this test the examinee is 
presented with a series of numerical problems each of which 
can be solved by a short method or a long method. The examinee 
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is instructed to look for the short method of solving each prob¬ 
lem. If he does not see the short method at once, he is to pass on 
to the next problem. This section of the test, unlike the other 
sections, emphasizes speed. 

The original plan of the test was to combine the scores on the 
first three parts in a verbal-factor score and to add together the 
scores on the last two parts to provide a numerical reasoning 
score. The basis for this partition of the test is found in the 
practice followed by the American Council on Education Psy¬ 
chological Examination and in the Differential Aptitude Battery 
both of which provide verbal and numerical scores which have 
been found to have differential predictive value. 

The reliabilities for the verbal and numerical scores calculated 
by means of the Kuder-Richardson (Formula ai) were found to 
be 0.86 and 0.86 respectively. The reliability for the total test 
calculated on the same basis was 0.90. The correlation between 
the verbal and numerical sections was found to be 0,20. These 
correlations are based on 484 cases. 

The correlation between the verbal and numerical section is 
much lower than that found with the American Council on Edu¬ 
cation Psychological Examination . In the latter case the correla¬ 
tion between the quantitative and linguistic sections is probably 
raised considerably by the fact that both sections involve a 
speed factor. 

Validation Procedure 

During the academic year 1948-49 the test was administered 
to 1 ,hi graduate students. About half of these students were 
in their first year of graduate work and the remainder had been 
in graduate school for varying lengths of time. For the purposes 
of this study, only those students who were registered for 6 or 
more hours of courses for graduate credit during each semester 
were included in the investigation. In addition, it seemed de¬ 
sirable to eliminate those foreign students who had taken their 
undergraduate work in non-English speaking countries. These 
eliminations reduced to 484 the number of cases included in the 
study, and these cases were distributed over the various areas 
of graduate study in the manner shown in Table %, 

Some of these groups are much too small for study; conse- 
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quently, it seemed advisable to eliminate the biological science 
library science and miscellaneous groups from further study 

Coir elation of Test Scores With Grades 

Table 3 summarizes the correlation of test scores with aver¬ 
age grades. 

Certain facts emerge from this table which throw considerable 
light on problems of the selection and guidance of graduate 
students. First, the correlations of test scores with grades in 


TABLE a 

Distribution of Students by Field 


No. in Each 

Field Field 



Physical Sciences. 


Languages and Literature. 
Education. ... 







68 

86 

77 

88 

68 

28 

30 

39 


TABLE 3 

Correlations of Test Scores With Grades 


Tart _ 

T It" III IV "V Verbal Num, 

Num. Store Score 

Verb. Num. Jnac- Purls Paris Total 

N Vorab. Read. Reas Reas, nuity 1,11,111 IV, V Score , 


Social Studies.. 68 .46 .2+ .09 .10 .05 .36 .09 .31 

Physical Science.. 86 .08 .46 .31 .33 -°9 .18 -*7 <V 

Engineering. 77 .°4 no .03 .14. .01 08 .10 .10 

Education...... 68 .45 .42 .38 .16 .28 .49 .24 .47 

Long, and Lit. 88 .41 .27 .34 -3 $ -*4 -4 7 37 -jo 


engineering are negligible in magnitude. This is in accordance 
with expectations since the criterion represents for practical 
purposes an unpredictable variable. 

Second, there are great differences between the areas of study 
in the abilities associated with grades. In social studies and 
languages the part which has the highest predictive value in¬ 
volves vocabulary rather than reasoning. In the physical sci¬ 
ences, on the other hand, it is the sub-tests involving reasoning 
which have the highest correlation with grades. This does not 
mean that success in the physical sciences depends upon reason- 
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j n g abilities while success in languages and social studies does 
not However, it is consistent with the observation that exami¬ 
nations given in the physical sciences call for problem solving 
and reasoning abilities while those given in languages and the 
social sciences commonly call for memory of facts rather than 
thinking skill. It is probable that the achievement of under¬ 
standing in most fields involves reasoning, but in many fields 
grades are assigned on the basis of the amount of accumulated 
knowledge rather than on the basis of the amount of under¬ 
standing achieved. 


TABLE 4 

Multiple Correlation Between Test Scores and Average Grades 


Field R 


Social Studies.,.. .. . .49 

Physical Sciences ... ,52 

Engineering. . ,17 

Education. . . .$4 

Languages and Literature. ;o 


TABLE s 

Comparison with Miller Analogies Test 


Field 

Multiple correlation 
of Academic Apti¬ 
tude Teat with 
average grade 

Correlation ol Miller 
Analouica Test with 
average grade 

Social Studies.. 

.49 

.18 

Physical Sciences. . 

•51 

•38 

Engineering. 

.17 

.09 

Education. 

•54 

.22 

Languages and Literature. 

. S° 

■34 


Third, the original hypothesis that a verbal and a numerical 
score represented useful measuring categories does not seem to 
be consistent with the data. Inspection indicates that differ¬ 
ential prediction would be much more effective if the sub-tests 
were grouped, not into numerical and verbal categories, but 
into reasoning and vocabulary categories. 

Fourth, since there is great variation in the extent to which 
each of the sub-tests predicts success in each of the areas of 
study, it would seem that scores on sub-tests should be differ¬ 
entially weighted before they are added together in order to 
maximize the accuracy of prediction for a given area of study. 
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Correlations Between Weighted Scores and Average Grades 

Tabic 4 shows the correlations between a composite of sub¬ 
scores weighted to give maximum predictions and average 
grades. These multiple correlations are, with the exception of 
the one for engineering, of sufficient magnitude to justify the 
use of the test as one aspect of the assessment and guidance of 
the graduate student. 

It is interesting to compare the above correlations with those 
found with the Miller Analogies Vest using a similar criterion of 
average grade over a year’s work in the Horace H. Rackham 
School of Graduate Studies. This comparison is shown in Table 

TABLE 6 


Beta Wrights Jor Ports of Test 
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TABLE 7 




Intcrcorrdoiions of Ports 0 } Test 



Part I, 


Part 111 

Part IV 


Port V 


Part 11 

Verbal 

Num. 


Num. 

ulary 

Reading 

Return T). 

Reason 


Ingenuity 

Part I. 

■f4 

•34 
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.04 

Part 11. 


.48 

• 35 


.10 

Partlll. 



•47 


■2 5 

Part IV. 


iai ^ t M , 



.64 


5. It may be noted that the Miller Analogies Test represents a 
combination of vocabulary and reasoning but it does not per¬ 
mit the differential weighting of these two variables. 

Unfortunately, it is not possible to compare the Academic 
Aptitude Test with the results of the Graduate Record Examina¬ 
tion since the only validation data on the latter is of pre-war 
vintage, and it is commonly recognized that the predictive 
value of tests in colleges has for unknown reasons declined in 
recent years. 

The Weights 0/ the Parts in the Composite Scores 

It is instructive to examine the weights (beta weights) given 
to the various parts of the test in the optimum prediction of 
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grades. These weights indicate the contribution which each 
part makes independent of all other parts. They are reported 
in Table 6 

This table again suggests rather strongly that in the social 
studies, education, and languages, grades are based on different 
criteria than they are in the physical sciences. It indicates also 
that a test of aptitude for graduate students would be better 
structured by having a vocabulary and a reasoning section 
rather than a verbal and a numerical section. In some ways this 
is rather surprising since the intercorrelations of the sub-tests 
in the battery show that, in general, the sub-tests cluster into 
the verbal and numerical categories This is illustrated in Table 
7 which shows the interrelationships between the sub-tests cal¬ 
culated on the basis of 484 cases. 

Summary and Conclusions 

The present study was concerned with the assessment of the 
student’s aptitude for graduate work. It was demonstrated that 
if success in graduate work is measured by grades then it is pos¬ 
sible only in certain fields to make predictions of success. In the 
present case, grades in engineering lacked homogeniety from 
one semester to the next and did not constitute a reasonably 
predictable criterion. This statement should not be generalized 
to imply that in other graduate schools grades in engineering 
would lack consistency from one semester to the next. However, 
it does imply that graduate schools should from time to time 
check on the stability of average grades since they are a basis 
for awarding degrees. If grades are unstable from one semester 
to the next then any degree awarded on the basis of them is 
awarded arbitrarily. 

The predictive value of a test designed to give a verbal-ability f 
and a numerical-ability score was studied. It was found, how- | 
ever, that the test did not give best predictions when this type 
of partitioning was used. The evidence indicates that it would 
be better to partition the test into a vocabulary and a reasoning 
section and to weight these parts differentially for making pre¬ 
dictions in various fields. This finding is important in view of 
the fact that two of the major testing organizations have an¬ 
nounced plans for providing a verbal ability and numerical 
ability test for the same level of difficulty as the present test. 



MEASURING ORIGINALITY IN THE 
PHYSICAL SCIENCES' 


MILTON M MANDBLL 

Examining and Placement Division, United States Civil Service Commission 

The United States Civil Service Commission started in Oc¬ 
tober, 1947, a study of selection methods for physicists, 
chemists, and engineers. The following report is an interim one 
which describes the selection methods which seem to predict 
best the ability to perform research work in the physical 
sciences, based on a try-out of tests on more than 600 chemists, 
physicists, and engineers. 

The data below are presented in three forms. In the first 
place, there are presented correlations between test scores and 
ratings by colleagues and supervisors on a five-point graphic 
rating scale on an item described as: "Originality of thinking— 
what is his ability in creative thinking? Plow original is he in 
his approach to problems when originality is necessary?” The 
second method used was to identify those scientists who were 
engaged in basic research work; this was done in order to 
determine the correlation between test scores and job per¬ 
formance on the basis of an over-all evaluation on a graphic 
rating scale by colleagues and supervisors. The third method 
was to determine the significance of the difference between the 
mean scores of research personnel and those of non-research 
personnel on the tests used. 

Where the criterion was the summation of ratings of col¬ 
leagues and supervisors, the method was to add together the 
ratings by all colleagues and supervisors and to divide these 
ratings by the total number of ratings, obtaining an average 
unweighted score, 

1 This study was carried on by the Civil Service Commission ns part of its regular 
program for the improvement of selection methods. Part of the work that was done on 
this project was performed by persons employed by the American Council on Educa¬ 
tion in its contract with the Scientific PersonnelDivision of the Office of Naval Re¬ 
search, Neither organization assumes any responsibility for the contents of this report. 
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A large number of tests were included in this study. These 
tests are: 

(1) figure analogies (abstract reasoning) 

(2) Gottschaldt figures 

(3) spatial relations tests, including cube-turning, suiface de¬ 
velopment, and a test developed by a member of the 
Civil Service Commission which is similar to the block- 
building test 

(4) foimulation 

(5) letter series 

(6) table reading 

(7) vocabulary 

(8) interpretation of data 

(9) hypotheses 

(10) scrambled sentences 

(11) subject matter 1 

Statistical data are not furnished in this report for many of 
these tests. In most cases, the reason for the omission of these 
data is that the correlations were not computed; the scatterplots 
indicated no significant correlations between the test scores and 
the criteria. Where the correlations were computed, they were 
not significantly different from zero. 

As will be noted below, many of these tests are quite brief in 
terms of number of items. This is considered a preliminary 
study and it was thought advisable to try out a large number 
of item types. Because of the short testing time available, it 
was necessary to abbreviate these tests, in some cases probably 
to a level below that needed for obtaining significant data on 
their value or lack of value. 

1. Relationship of test scores to ratings on originality . 3 —Sub¬ 
ject-matter tests produced significant correlations with ratings 
on originality. For example, for 35 physicists at the National 
Bureau of Standards at grades P-i through P-7, the correlation 
with a test of approximately 100 items in the basic field of 
physics was +.59. For 58 chemists at the Eastern Regional 
Research Laboratory of the Department of Agriculture in 
grades P-i through P-4, the correlation with a basic test in 

' These tests are described in an article, "Selection of Physical Scientists,” by 
Milton M. Mandell and Sidney Adams, Educational and Psychological Measure¬ 
ment, VIII (1948) j_75-j8a. 

3 All correlations included in this report are Pearson product-moment correlations 
unless otherwise noted. 
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chemistry of approximately roo items was +.46. For 17 chem¬ 
ists at the Bureau of Standards in grades P-i through P-6 the 
correlation was the same, +.46. For 53 cases at the Western 
Regional Laboratory of the Department of Agriculture, the 
tetrachoric correlation was +.46 for the chemistry test, with 
the sample including chemists at P-i through P-4. For 19 
electronics engineers at the Naval Electronics Laboratory in 
grade P-a, the correlation between the same basic physics test 
that was given to physicists and ratings on originality was 
-b.58. 

In addition to the subject-matter test, other tests produced 
interesting results, For a test of approximately 35 items pre¬ 
pared by Professor Max Engeihart of the Chicago City Junior 
Colleges on the ability to evaluate hypotheses, the correlation 
with ratings on originality for 31 chemists at the Bureau of 
Standards in P-i through P-6 was + .49. This result did not 
stand up with the Eastern Regional sample of chemists; how¬ 
ever, the correlation for this test at the Eastern Regional 
Laboratory for 45 chemists engaged in basic research work, 
when over-all ability in basic research was the criterion, was 
+.44. 

In addition to the subject-matter and hypotheses tests de¬ 
scribed above, a test in basic college mathematics of approxi¬ 
mately 30 items correlated +.41 for 62 physicists at the Bureau 
of Standards, with ratings on originality being used as the 
criterion. 

1. Critical ratios between research and non-research groups .—A 
number of tests provided significant differences between the 
mean scores of those engaged in research work, either basic or 
applied, and of those engaged in auxiliary work in the sciences, 
such as testing. 

The formulation test, in the form administered at the Bureau 
of Standards, consisted of 15 items which involved the ability 
to translate a narrative statement into an algebraic equivalent. 
It produced significant differences at the 1 per cent level of 
confidence between ao chemists engaged in research work and 
6 chemists not in research work. The mean score of the research 
workers was 8.9, and the mean score of the non-research workers 
was 5.5. 
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A scrambled sentences test of seven items which involved the 
ability to determine what the last word of the sentence would 
be if the sentence were correctly arranged also produced a 
significant difference at the 1 per cent level of confidence be¬ 
tween research and non-research chemists. There were 28 
chemists in the research group with a mean score of a.8, and 
8 chemists in the non-research group with a mean score of 1.9. 

For physicists, the same formulation test described above 
also differentiated between research and non-research physicists 
at the 1 per cent level of confidence. The mean score of 2.3 
research physicists in the formulation test was n.o, while the 
mean for 13 non-research physicists was 7.8. 

The table reading test of the Air Force was included in the 
battery of tests. This is essentially a test of carefulness, visual 
acuity, and attention to detail. For a group of 56 engineers, 17 
of whom were in research work and 39 of whom were in non¬ 
research work, this test, which takes about 7 minutes to ad¬ 
minister, produced a critical ratio significant at the 5 per cent 
level of confidence. The mean score for research engineers was 
17.9; the mean score for non-research engineers was 13.9. These 
engineers were in the electrical and mechanical fields at the 
Naval Ordnance Laboratory and were in grades P-i through 
P- 3 - 

The same table reading test produced significant results on 
another population of engineers. This sample of 5a engineers in 
grade P-3 consisted of 29 research engineers and 23 non-research 
engineers. In this case, the test score was the number wrong 
rather than the number right. The mean score of the non- 
research engineers in terms of number wrong was higher than 
the mean score of the research engineers, with a difference 
significant at the 5 per cent level of confidence. The mean score 
for the research engineers was .57 wrong answers; the mean 
score of number wrong for the non-research engineers was 1.24 
answers. 

A similar analysis was based upon 87 engineers at the Naval 
Electronics Laboratory in grades P-i through P-4. Thirty-two 
of these engineers are in research work and 55 are in non¬ 
research work. Differences which were significant at the 1 per 
cent level of confidence, in favor of the research engineers, were 
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obtained on the formulation test described above and on a 
vocabulary test. There was a difference of five points in the 
mean scores on these two tests in favor of the research engineers. 

A significant difference at the 1 per cent level of confidence 
was also obtained on a test of spatial relations in favor of the 
non-research engineers. This test in spatial relations was pre¬ 
pared by a member of the staff of the United States Civil 
Service Commission and is similar to the block-building test 
frequently used. The average score of the non-research en¬ 
gineer on this test was 16 points higher than the average score 
of the research engineer, 

3. Correlation or test scores with ability in basic research— It 
was possible to obtain a group of 45 chemists from the Eastern 
Regional Research Laboratory who were engaged m basic re¬ 
search work. These chemists were rated by colleagues and 
supervisors on a five-point graphic rating scale. The method 
for determining the rating on basic research ability was to add 
up the ratings on over-all ability and divide by the number of 
ratings. In addition to the results with the hypotheses test 
mentioned above, namely, a correlation of +.44, a correlation 
of +.61 was also obtained with the basic chemistry test de¬ 
scribed above for these 45 chemists in basic research. This was 
the only group in basic research sufficiently large in numbers 
to justify the isolation of the group for correlation purposes. A 
number of other tests were tried out with this group but none 
produced significant results. 

Summary 

I. The formulation test seems to have the widest usefulness 
in differentiating research from non-research personnel. Sig¬ 
nificant differences at the 1 per cent level were obtained with 
samples from the fields of physics, chemistry, and engineering. 

1. Subject-matter tests also provided pertinent data for 
physicists, chemists, and engineers, using ratings on originality 
as the criterion. 

3, The other tests produced significant results but their use¬ 
fulness was more limited. The mathematics test correlated sig¬ 
nificantly with ratings on originality for physicists; the 
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scrambled sentences test differentiated between research and 
non-research chemists; tire table reading, vocabulary, and a 
form of block-building test produced significant data for en¬ 
gineers, 

4. The results obtained for these tests from the various 
samples diffeied in a number of cases. The differences may be 
due to differences in the samples, the nature of the work, 
differences in criteria content, or reliability. 



PROBABILITY APPROACH TO FORECASTING 
UNIVERSITY SUCCESS WITH MEASURED 
GRADES AS THE CRITERION 


L. J. LINS 

University of Wisconsin 

For some time, emphasis has been placed upon the ability 
to forecast academic success in terms of grade-point averages 
at the University of Wisconsin. As early as 1909, Dearborn 1 
attempted to discover whether relative standings in the sec¬ 
ondary school were indicative of academic success at the Uni¬ 
versity. As time progressed, various persons investigated the 
possibilities of “predicting” grade-point averages through mul¬ 
tiple regression. 

In September, 1918, r6S7 University of Wisconsin freshmen 
took the American Council Psychological Examination, Seven 
hundred and fifty-six of these freshmen were selected as a 
sample. All were in attendance at the University for at least 
one year after taking the examination, American Council per¬ 
centiles (based upon national norms) and high-school percentile 
ranks were computed. Zero-order Pearson-Product-Moment 
Coefficients of Correlation were then calculated between these 
respective factors and grade-point averages for the freshman 
year. A multiple coefficient of correlation of .711 resulted. 2 Thus 
about 50 per cent of the variance of grade-point average was 
associated through regression with the two independent vari¬ 
ables named. This shows a substantial concomitant variation. 
Since no factors have been found which would forecast uni¬ 
versity success better, it was thought advisable to employ the 
American Council Psychological percentile and the high-school 
rank percentile in this study. 

The approach here employed is one of trying to set up a 

1 Gustav T. Froehlich, ,r Thc Prediction of Academic Success at the University of 
Wisconsin/’The University of Wisconsin Bureau of Guidance and Records, Bulletin 
4574, Scries 1358, October 1941, p, 3. 

1 meI, ao- 24 . 
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system of success or failure probabilities associated with bi¬ 
variate quarter ranges of the American Council Psychological 
and high-school rank percentiles. The sample consists of 1789 
freshmen, 1189 men and 600 women, who entered the Uni¬ 
versity of Wisconsin, First Semester, 1948-49. All are residents 
of Wisconsin and were graduated from Wisconsin high schools. 
Nine per cent were graduated with secondary classes of 30 or 
less students, 13 per cent with classes of 31 to 60, 27 per cent 
with classes of 61 to 150, and 51 per cent with classes of 151 or 
over. 

The American Council Psychological Examination, 1947 edi¬ 
tion (local norms), and high-school rank percentiles were com¬ 
puted by the Student Counseling Center, University of Wis- 

TABLE 1 

Mian, Standard Deviation of the Distribution , Standard Error of the Mean, and Critical 
Ratio 0f th e Difference Between Means for the Samples Used 


Men 

Variable M sj 

Grade-Point Aver¬ 
age . 1.178 .841 

Hign-School Rank 

Percentile . 64.936 15.51 

American Council 
Psychological 

Percentile . ... 51.315 18.55 




Women 


C. R. 

r„ 

M 


*u 

or Din, 

ot Means 

• Ol6 

1.409 

.8lO 

-035 

5-33 

•793 

77.101 

11-30 

.910 

10.01 

.876 

47.800 

17.119 

1.169 

3-°9 


consin. First-semester grade-point averages at the University 
and the above-mentioned percentiles were recorded. The sub¬ 
jects were divided into two groups by sex in an attempt to 
discover whether or not the men and women differed signifi¬ 
cantly in the factors under consideration. If significant dif¬ 
ferences were found, it would indicate that, for the type of 
approach herein described, it would be better to forecast uni¬ 
versity success separately by sex rather than by using the whole 
group without reference to sex, 

The groups are described in Table r. It is seen that the 
reshman women maintained a higher mean grade-point average 
than the men in their first semester at the University of Wis¬ 
consin and ranked higher on the average in the high-school 
classes with which they were graduated. However, the mean 
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percentile rank of the men on the American Council Psycho 
logical Examination was higher than that of the women. 

Again referring to Table i, one notes that the means of the 
men and women on the three variables differ significantly. This 
would indicate that there is cause for keeping the samples of 
men and women separate. 

A significant association is found between grade-point aver¬ 
age and high-school rank and the results of the American Coun¬ 
cil Psychological Examination respectively using the Pearson- 
Product-Moment method of correlation. The correlation 
coefficients together with the critical ratios are presented in 
Table 2. 

In setting up the system of success or failure probabilities, 
each of the two groups, that is men and women, was then 
subdivided into 16 bi-quarter categories according to percentile 

TABLIi 1 

Coefficients oJ Correlation with the Dependent Vaiiable of First-Semester Giade-Point 
Average Togelhet with the Ciitical Ratio oj the Coefficient* 

Min Women 

Variable r C.R.r r C.R, 

1 ii,;h N,.-l 'tail's JV'iu n:i « -j .. 

Ariii'iu.iii t'oj.an I'syd'b.i'aKdl I'sr.Ciitil.'.... <t-j -i.j- 

r 1 

* C.R., m - where a r •» — 7erSS~ 

Or VN ~ I 

rank on the American Council Psychological Examination and 
in high-school class. The resulting “cells” were composed of 
individuals who had approximately the same percentile ranks. 
For example, all individuals who ranked between the first and 
the twenty-fifth percentile on both factors would be in the 
same “cell.” 

Each “cell” was then divided according to grade-point aver¬ 
ages of the individuals within the “cell.” In computing grade- 
point averages at the University three grade points are assigned 
for each credit at grade of A, two for each grade of B, one for 
each grade of C, zero for each grade of D, minus one-half for 
each condition grade, and minus one for each grade of failure. 
Averages as computed followed this pattern, Therefore a B 
average was considered as a.oo-a.99, C as 1.00-1.99, D as 0.00- 
0.99, and Fail as — i.oo-(—0.01), Frequency distributions were 
then set up for each “cell” and percentages based upon the 
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total individuals in the “cell” computed. In addition, since a 
grade-point average of i.oo is necessary for satisfactory progress 
in the University, all grade-point averages above i.oo were 
considered as successful. As presented in Table 3 and Table 4, 
this gave the probability of success of entering freshmen. 

TABLE 3 

Probability of Academic Success of New Male Freshmen Based Upon High-School 
Percentile Rank and Peicenlile Rank Ametican 
Council Psychological Examination” 


American Council Psychological Percentile 

High School Grade 

Runic Percentile Level 0-21 25« 50-74 7S-100 



B 

C 

14 

49 

63 

19 

56 

75 

3 a 

Si 

83 

45 

45 

90 

7J-100 - 

D 

Fail 

( 49 ) 

(107) 

(136) 

(228) 

- - 

- . — 

- —— 

*— 

33 

4 

37 

20 

5 

*5 

15 

a 

17 

9 

1 

10 


B 

C 

6 

40 

46 

5 

5 ° 

55 

H 

Si 

66 

14 

55 

69 

jo -74 ' 

D 

Fail 

1 

( 85 ) 


_ ( !d } _ 

(90 

( 65 ) 

- - 

-- 

54 


45 

47 

7 

34 

28 

3 

3 i 


B 

C 

1 

28 

29 

O 

33 

33 

8 

40 

48 

O 

64 

64 




(80) 

(66) 

(48) 

(22) 

25-49 - 

D 

Fail 

' Ti 

71 

4 * 

26 

67 

44 

8 

S» 

18 

18 

36 


B 

0 

17 

6 

30 

0 


]8 



C 

17 

24 

47 

47 

*4 

42 



(fkj)f 

( 33 ) 

(19) 

( 17 ) 

0-24 

D 

Fail 

s° 

33 

83 

00 

70 

3a 

ai 

53 

35 

13 

58 


* Probability of success is based upon experience with first-semester freshmen 
1948—49 who were graduates of Wisconsin High Schools, The interpretation might 
be as follows: 

It has been our experience that 83 per cent of the men ranking below the twenty- 
fifth percentile on the American Council Psychological Examination (local norms) 
and in high school class were not successful as first-semester freshmen. 

t The number in parentheses is the sire of the sample. Numbers above the broken 
line are probabilities of receiving a C or B or better average. The sum of these two is 
the probability of success. 

In addition to the interpretation as presented in the footnotes 
of Tables 3 and 4, it seemed desirable to determine a point at 
which the probability of success would be equal to, or greater 
than, the probability of failure. Integral values from one to 
four were assigned to the percentile divisions by quarters of the 
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American Council Psychological and high-school rank. Thus the 
quarter 0-24.9 has a value of one, 25-49.9 a value 0 f two, 
50-74.9 a value of three, and 75-99.9 a value of four. It is 
interesting to note for the men, excluding the lowest quarter 


TABLE 4 

PnJabililj (tf Academu Success nj AYir Female Freshmen Based upon High-School 
Perce mile Rank ami Percentile Rank American Council Psychological Examination* 


American Counc.il Psychological PctccnLile 


High 

tUnSi Pcttttllllc 

CratJd 

MV«l 

0-21 

25 49 

Sa-74 

75-100 

7 $ -ico 

B 

C 

n 

Fail 

' 4 68 

4 (Al) 

' 

J? 80 

( 107 ) 

20 

O 

37 90 

3 ( 106 ) 

? 10 

60 

34 94 

( 115 ) 

I*' 

jo-74 

R 

C 

L) 

Kail 

3« ^ 

(47) 

8 5 s 

44 (54) 

1 < 8 

•3 es 

S 2 4 
(23) 

So 7 > 

( 14 ) 


B 

C 

4 « 

ris) 

6 50 

44 

( 16 ) 

*9 

(7) 

°o 0 

( 0 ) 

ij~49 

» 

Fail 

36 sfi 

37 so 

1.1 5 

71 7i 

0 - 

0 -I 4 

B 

C 

0 It 

3 J J 

(I 6 )t 

0 33 

33 JJ . 

(3) 

15 SO 

45 4M 
(4) 

( 0 ) 

D 

Fail 

38 69 

3* ' 

6 l *1 

55 jo 
aj 4 



* Probability ol success is basea upon experience win V"T 

1948-49 who were graduates of Wisconsin High Schools. The interpretation mig be 

It has been our experience that 60 per cent of the women ranking below the twenty- 
fifth percentile on the American Council Psychological Examination (local norms) 
and between the fiftieth and seventy-fifth percentile in high-school class were not 

successful as first-semester freshmen. , , , .. 

t The number in parentheses is the size of the sample. Numbers above the woken 
line arc probabilities of receiving a C or B or better average. The sum of these two is 
the probability of success. 


in high-school rank, that if the quarter value of high-school 
rank is added to the quarter value of the American Counci 
Psychological, generally speaking, a sum of five or more indi¬ 
cates a 50-50 or greater chance of academic success. A sum 0 
six or more indicates at least a 64-36 chance of success and a 
sum of seven at least 69-31. 
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The same generally holds true for the sample of women if all 
“cells” below the fiftieth percentile in high-school rank are 
eliminated. The exceptions are “cells” 1-3 and a-2, high-school 
rank being given first. This may be due to the small frequency 
and consequent inadequacy of sampling m the lower ranges. 

Since percentile rank in high-school class is directly affected 
by size of class, it is assumed that any forecasts for persons 
graduated with small classes will not be particularly valid. It 
was thought advisable, therefore, to either eliminate graduates 
of small high schools or to arrive at separate success proba¬ 
bilities for this group. 

In applying the regression equation in use at the University 
of Wisconsin for forecasting grade-point averages, it was found 
that a difference in percentile rank of one would not affect the 
forecasted grade-point average by more than 0.05 grade point 
where the size of class is 30 or above. A graduating class of 30 
was then selected as the division point between the small and 
large high school. In applying the same procedure for success 
probabilities as outlined for the whole group, it was found that 
eliminating the graduates of small high schools did not affect 
the probabilities previously reported. It was impossible to arrive 
at any accurate success probabilities for the small class because 
of limited size of sampling. Thus the probabilities of the group 
of less than 30 in graduating class and the group from classes 
of 31 or more students are not reported here. 

It would seem from the results presented that success proba¬ 
bilities could be very beneficial in the educational guidance 
program both before entering the University and during Fresh¬ 
man Orientation Week. Power of discrimination seems evident. 
With larger samples and differentiation by colleges, the proba¬ 
bility forecast might well take the place of the grade-point 
average forecast. It might also be more readily understood by 
the prospective student. Rough measures have been used. It is 
the feeling of the writer that the results have been interpreted 
previously as if these rough measures were precision instru¬ 
ments. Therefore possibly too much emphasis has been placed 
earlier upon the small differences between forecasted grade- 
point averages. 



PREFERENCES AND BEHAVIOR RATINGS OF 
DOMINANCE 


W 11 .UAM R. B 1 RGE 
Rctwsclacr Polytechnic Institute 

It is well recognized that there is not a necessary corres¬ 
pondence between a person's conduct and his report of his 
conduct. This situation is generally acknowledged in the field 
of interest and personality measurement. Meehl and Hathaway 
(2) have observed that whether or not a person reports his 
conduct accurately on a questionnaire, his answers may still 
constitute a significant aspect of his behavior. Kuder (1) points 
out that there is no necessary relation between scales on his 
Preference Record-Personal and the corresponding areas of 
actual behavior, but he believes that the use of a number of 
relatively independent scales is a promising starting point for 
prediction studies. 

This paper, however, is concerned with the question 
of whether there is a correspondence between conduct and 
verbal report. The criterion of behavioral ratings was used as 
the measure of conduct, while the verbal responses were ob¬ 
tained through the use of the Kuder Preference Record—Per¬ 
sonal, 

In connection with another study, the writer obtained soci¬ 
ometric ratings on the trait of dominance from the members 
of eleven fraternity groups, three sorority groups and two 
female dormitory groups. Dominant individuals were defined 
as those who "show the greatest assertiveness and ability to 
influence others in group situations.” The ratings were made 
on a total of 827 subjects. 

With the exception of three small fraternities, the four mem¬ 
bers from each group who received the highest ratings on 
dominance and the four members who received the lowest 
ratings were selected for further study. From the three small 
fraternities, only two members from each extreme of the domi- 

3?2 
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nance ratings were selected. Since, in two groups, three in¬ 
dividuals tied for the third from the lowest ratings, there were 
58 subjects in the high dominant extreme and 60 subjects in 
the low dominant extreme. 

All of these subjects were requested to fill out the Kuder 
Preference Record—Personal The response was fail ly good. 
Although 11 members of the high dominant group and 15 mem¬ 
bers of the low dominant group refused to cooperate in the 
study, there remained a pool of 9a records for analysis. Forty- 
seven of these records had been filled out by subjects who 
received the highest ratings for dominance, while 45 forms had 
been filled out by subjects who received the lowest ratings for 
dominance. 


TABLE 1 


The t's of the Differences Between the Mean Scores on Each of the Six Scales for the High 
and Low Dominant Groups 


Scale 

Mean Score (N - 47 ) 
High dominant group 

Mean Score (N « 45 ) 

Low dominant group 

td 

p 

A 

41.17 

38.04 

a.09 

.04 

B 

.12.4° 

30.96 

• 7> 

.48 

C 

37-09 

33.44 

>■54 

.12 

D 

37-91 

40.76 

1.18 

•24 

E 

S2..H 

47.16 

a. 27 

»02 

H 

74-f>4 

81,87 

1 . 10 

.04 


The 9 1 records were scored for scales A, B, C, D, and E. 
The five areas of activity related to these scales are as follows: 

A. Preference for taking the lead and being in the center of 
activities involving people, 

B. Preference for dealing with concrete problems and every¬ 
day affairs rather than interest in imaginative activities. 

C. Preference for thinking, philosophizing, and speculating. 

D. Preference tor pleasant and smooth personal relations 
which are free from conflict. 

E. Preference for activities involving the use of authority 
and power. 

In addition to the five regular scales, the records were also 
scored on the H scale, an experimental scale designed to 
measure the degree to which an individual deliberately tries to 
make a good Impression on the test as a whole. It has been 
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found that individuals who attempt to make a good impres¬ 
sion, rather than to answer sincerely, generally receive low H 
scores. A personal communication from Mrs. Phyllis Cram, of 
Sears, Roebuck and Co., suggested that the PI scale might 
discriminate between dominant and non-dominant groups. 
Mrs. Cram tested several administrators with the Kuder Pref¬ 
erence Record- Personal, and found that the abler administra¬ 
tors tended to receive lower scores on the H scale than did the 
less able administrators. Mrs. Cram believes that, in this case, 
there should be no implication that the good administrators 
were insincere in their answers. She suggests the explanation 
that these people are “adept at creating a good impression.... 
They are playing their roles expertly, and an effective actor is 
always sincere even if it is a role.” 

After the records had been scored, the mean scores on each 
of the six scales were determined for the high and low dominant 
groups. The t’s of the differences between these means were 
then computed. The results of this analysis are presented in 
Table I. 

As indicated in this table, the differences between the high 
and low dominant groups on the three scales A, E, and H are 
significant at the 5 per cent level of confidence. (The PI scale 
means of the two groups were, however, within the “honest” 
limits.) More specifically, in terms of expressed preferences, 
these results indicate that the highly dominant person tends to 
differ from the person with low dominance ratings as follows: 

(1) he prefers to take the lead and be in the center of ac¬ 
tivities involving people; 

(a) he prefers activities involving the use of authority and 
power; 

(3) he prefers activities ordinarily chosen by people trying 
to make a good impression. 
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reproducible scales and the assumption 

OF NORMALITY 1 

ROBERT G. SMITH, Jr. 

University of Illinois 

The more commonly used statistical tests of hypotheses as¬ 
sume that the universe of values, as measured, is normally dis¬ 
tributed. In some instances, the distribution of scores which an 
investigator obtains from his sample gives him practically no 
confidence that this condition is met. That considerable thought 
has been given to this problem is clear from the recent review 
by Mueller (6) of numerical transformations. The purpose of the 
present paper is to examine some of the characteristics of the 
relatively new technique of reproducible scales from the stand¬ 
point of their use with statistics requiring normality assump¬ 
tions. 

The technique of “Scale Analysis,” originated by Guttman 
(a), has attracted considerable attention, since it promises to 
lead to the construction of tests which are unidimensional. 
Loevinger (4, 5), in the area of tests of ability, has dealt with 
the same problem in presenting techniques leading to the con¬ 
struction of “Homogeneous Tests,” as she prefers to call them. 

Tests are used for two major purposes: to order individuals 
in the characteristic being measured, and to test hypotheses 
concerning characteristics. The former may not involve the as¬ 
sumption of normality; the latter requires this assumption if the 
hypotheses are to be tested with statistical techniques such as 
the critical ratio, /, and analysis of variance, Some deviation 
from normality may not affect the precision of the statistical 
tests to any great degree. If, however, the user of reproducible 
scales has a distribution of scores which deviates strikingly from 
normality, then the principle to be described in this paper may 
permit him to approximate more closely a normal distribution, 

1 The writer wishes to express his appreciation to Dr. L. L, McQuitty for his critical 
comments on this paper. 
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That the assumption of population normality is not amiss in the 
case of many reproducible scales gains support from research 
in other forms of measurement. For instance, Thurstone (7, 8) 
assumed normality for the purpose of developing scaling tech¬ 
niques for tests of intelligence and for paired comparisons. He 
was then able to test this assumption experimentally. The test¬ 
ing of the assumption of normality in the area of reproducible 
scales is a topic for further research. 

The common feature of both the Guttman and Loevinger 
techniques is that they aim to construct tests whose items make 
perfect discriminations. In the case of a dichotomously scored 
item, no one who fails the item should make a higher total score 
than one who passes. With a multiple-response item such as is 
used in attitude scales, no one giving a lower weighted response 
should make a higher total score than one who gives a higher 
weighted response. While the major emphasis in the various 
techniques for “Scale Analysis’’ has been in reproducing re¬ 
sponses to individual items from the total score, it is possible 
in a perfectly reproducible scale, since it gives perfect dis¬ 
criminations, to deduce the distribution of the total score from 
the number of individuals giving each response to each item. 
Table 1 shows how this can be done. 

This means that it will be possible, by the selection of items 
with properly located cutting points, and by the combination 
of categories in multiple-response items, to obtain a set of 
items which, when combined, give a normal distribution of the 
total score. Such a set of items is shown in Table 1. It will be 
noted that the characteristic of a perfectly reproducible scale 
which gives a normally distributed total score is that the scale 
makes relatively few discriminations between individuals in the . 
center of the range of scores, and progressively more as the ex¬ 
tremes are approached. While it is, of course, unlikely that a 
perfect normal curve will appear in practice, we should be able 
to approximate normality. 

If a sufficiently large pool of items is available, the investi¬ 
gator may select the number of items he intends to use in the 
scale. Then, if he wants, say, eleven items with cutting points, 
at one-half sigma units apart, reference to the table of area 
under the normal curve will give him the proportions desired 



TABLE i 

Item Responses and Total Scores of Perfectly Reproducible Test 
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in each item category. If he desires a different number of items 
the same procedure may be followed. 2 

As Guttman (3) has pointed out, the rank order of an in¬ 
dividual with regard to a scalable universe of content remains 
invariant no matter which items are used. Therefore, the selec¬ 
tion of items to obtain a normally distributed total score will 
in no way affect the other valuable properties of the scale. In 
fact, the placing of restrictions on the location of the cutting 
points may lead to more efficient scales. 

Two scales may have equal reproducibility, but yet have 
different characteristics as regards the differentiation of in¬ 
dividuals. (Compare Tables I and 2 in this respect.) It is recog¬ 
nized that normal distributions may not be desirable in all pur¬ 
poses to which tests may be put. However, for a given purpose, 
if the distribution of cutting points be identical in two tests, 
equal reproducibility will mean equal efficiency. 

A recent use of analysis of variance with a reproducible scale 
is the study of Gage (i). lie, recognizing that the data shown 
in his Table 15 did not form a normal distribution, was cautious 
in the interpretation of his results. However, if he had a larger 
pool of items from which to draw, the selection of items with 
properly located cutting points could have given him a normal 
distribution of scores. 

According to Guttman (2), one of the advantages of scaling 
theory is that it does away with "untested and unnecessary 
hypotheses about normal distributions.” Although normality 
assumptions are not required for scale analysis itself, it may be 
necessary in some of the uses to which scales are put. There¬ 
fore, it is desirable to have a principle to use In achieving normal 
distributions of total scores on reproducible scales and homo¬ 
geneous tests. 
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A FACTORIAL STUDY OF BELIEFS 1 

J. W. HOLLEY 
University of Southern California 
and 

CLAUDE E. BUXTON 
Yale University 

The use of tests of false beliefs is currently popular among 
teachers of beginning psychology, Such tests serve to stimulate 
the interest of students in the field of psychology and call atten¬ 
tion to many misconceptions and prejudices at the outset. The 
investigation reported in this paper is concerned with this type 
of test. Our task was to describe such false beliefs in terms of a 
limited number of underlying variables, obtained by the method 
of factor analysis, The results of such an investigation should be 
of value to the teacher of beginning psychology, for they ac¬ 
quaint him with the dimensions of misconception among stu¬ 
dents. To students of psychometric techniques, the method 
alone will be the object of concern. 

Statistical background of the study.—In the factor-analysis ap¬ 
proach, the investigator may analyze a correlation matrix in 
which either items or individuals function as variables. The 
inter-individual correlation method, which has often been 
adopted in factorial investigations in aesthetics, is known as 
the "inverted'’ method of factor analysis. One reason for using 
it in this currently unstructured field is that there are so many 
available test items that a matrix of a corresponding order is 
impractical. This is also true in the domain of lay beliefs about 
behavior and about psychology. For this reason the"invertcd" 
method of factor analysis, or "Q technique" was employed in 
our study. 

After solving a particular inverted factor-analysis problem, 
we have as a result, a matrix of common factor loadings. There 

‘The first author wishes to express his appreciation to Professor W, Stephenson, 
visiting professor at the University of Chicago, for advice regarding the Q technique, 
particularly In relation to the importance of Item difficulty in this method. 
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is one row for each individual and one column for each common 
factor. The square of each common-factor loading indicates 
that portion of the total variance in any particular individual’s 
beliefs which can be attributed to a single factorial source, 
fs The sum of the common-factor variances for each individual 
is known as his communality. In this study we may regard this 
figure as an “index of agreement” between a particular in¬ 
dividual and the other individuals represented in the matrix. It 
indicates the extent to which his evaluations of statements of 
belief were determined, as were those of the other individuals. 
The remaining portion of variance ( + 1.00 minus the commu¬ 
nality value) could be analyzed further into specific and error 
variance. The specific variance (estimate of reliability for an 
individual minus his communality) could be interpreted as an 
"index of individuality” of beliefs, compared to those of other 
individuals in the investigation. The error variance ( + 1.00 
minus the reliability coefficient for the individual) would repre¬ 
sent the remaining portion of variance. In our investigation, 
however, the reliabilities for the various individuals were not 
obtained. Therefore, analysis beyond the determination of com¬ 
mon-factor variances making up the communality for each in¬ 
dividual was impossible. 

Procedure .—The second author has constructed, over a period 
of years, a 100 item true-false test of misconceptions*. The 
items retained, for successive editions, were those which showed 
some biserial correlation with the total score on the test, were 
not passed or failed by all subjects, and were worded so that as 
many correct responses were true as were false. Thus, the typi¬ 
cal method of finding all of the items students would fail was 
not used to build this test. This questionnaire is currently used 
in the beginning classes on the Evanston campus of North¬ 
western University. 

From a group of 500 test papers, secured in the fall of 1948, 
jo weie randomly selected. From these 30, 20 papers were 
finally selected to function as the basic variables of the correla¬ 
tion matrix. (The 10 papers which were eliminated were those 

1 Some of the items were taken from Valentine (5), some from Garrett and Fisher 
(1), and some were obtained personally from C.d'A. Gerken of the University of Iowa. 
A mimeographed copy of this test may be obtained by writing to the second author. 



402 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


g ['[ V) 

•S ! J O 

« 5 \ < ~ 

£ i j » 

all« 

£ v i> 
h s- 


&S 35 


rt m 4 w 
<n o 'a m h 


C "0 cl C) Lr\VQ 
CO cl -rt r>, 


CO v» CT'LD r- 'Cl 


$ 


<so qn w o ro o va vo 
ion o n «h m coci o 

. \ 

‘o'on p"* Q cl ci n ia{-) 

r*o>^-o Poci ci to to 
i.i' 

$2 8 S'S'iS'g'g^K’a 


Vo 


town »** n vo ooo co h 
w ci n ♦ n h m d ri ro 


o 0-0 oo ir> t-o o t^oo lo o 

Q ~ -> OnOO^tOfOOro 


CO Cl cod - 4 ’ cl 'rf' <5 w <3 Co 


*+ fv r-^oo co» ionQ«hQr>on 
MOto« ct »-• ^ *«n ■ri' unct n K ►* »-> 


i 


'& 3 !? 8 ' 3 ‘Sr 8 a ^8 ftS'S^S 1 


i 


i 


ov o '*f- O wown ci n own n *-odo 
vo i- oo cH n cl u-i o ci 4 1 tontOO^o d 

I.f 

« s-Kfras a j? sis'ft ft s £•!*:? 


2 S’ 8 ' 9-8 2 8 R8 > S?2 S-S8 


Cl rO ?t *A> jjj £j. ^}* 


£ & $ S 3 2 Si'S - MnSlnn . 

m cl co - 4 - vovo t^oo 0 o w d to ■+ '-''’V.O r^co <?s Q 

















FACTORIAL STUDY OF BELIEFS 403 

with the most extreme scores, i.e., those individuals who either 
answered almost all or very few of the items correctly. The 
reason for this final selection was that we wished to avoid cell 
entries, in the tetrachonc correlations, which were close to 
zero.) 

A matrix of 20 variables was thus obtained from the tetra- 
chonc correlations of the scores of each individual with each of 
the other nineteen individuals. This correlation matrix, with 
individuals as variables, is presented in Table 1. 

TABLE a 


Centroid Factor Loadings (and Communahties) 


Individuals 

Factor I 

Factor II 

Factor III 

Factor IV 

Commonalities 

I 

.582 

-.385 

— .029 

-.318 

■ 589 

1 

497 

.120 

-.378 

-• 33 ! 

■ 517 

3 

.726 

.197 

.084 

.21J 

.619 

4 

.608 

338 

— . 500 

-.149 

• 756 

5 

■ 35 i 

-.138 

-.268 

.294 

•338 

6 

955 

-•494 

— .261 

.123 

.756 

7 

.646 

■351 

.308 

.274 

*710 

8 

.386 

— 167 

.076 

“•474 

■ 4 JI 

9 

.416 

.186 

.419 

,o6j 

•435 

10 

.419 

.248 

-.279 

— .250 

• 377 

n 

•394 

-.133 

3'9 

—. 148 

• 333 

12 

•396 

-■334 

167 

•*34 

• 35 ‘ 

13 

•598 

-.13* 

—. 108 

125 

.402 

14 

.669 

.216 

. 1 10 

.023 

.507 

i 5 

•529 

-.103 

—. 106 

.267 

■373 

l6 

.488 

,117 

-.098 

. I46 

.278 

17 

■313 

-■305 

.312 

—. 189 

■ 3 H 

18 

■943 

— ,091 

-.103 

.089 

,916 

19 

■ 5*1 

• 34 ' 

097 

.271 

•471 

20 

,496 

.398 

,207 

-.271 

,498 


From this correlation matrix, four centroid factors (see Table 
2) were extracted according to Thurstone's centroid method of 
factor analysis (2). The reference axes were then rotated follow¬ 
ing Zimmerman’s graphic method (6), so as to minimize the 
number of zero loadings. The rotations are presented in Table 
3, while the final rotated factor loadings are presented in Table 
4 - 

Interpretation of factors .—An important problem in the use of 
the“Q technique” is determining the meaning of the extracted 
factors. The rotated factor loadings, by themselves, tell us very 
little about the nature of the factors, for they merely indicate 
the rank order of the individuals in regard to these dimensions. 
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Such a rank order leaves much to be desired in clarifying such 
meanings. 

Stephenson (fi attempted to solve this problem in the case of 
aesthetic judgments by interrogating the subjects in regard to 
their preferences and hy observing the judgments of those in- 
dividual* who were highly saturated in one factor. Guilford and 
Halley (l) employed a system of weighted judgments, They 



TABLE 3 




Rotation ttj Centroid Axis 



A AM 

Degree* 

Direction 


r & n 

r k tv 

46 

coun ter-clockwise 


SO 

covin ter-clockwise 


i" fif tu 

57 

counter-clockwise 


r" & ii' 

V 

clockwise 


TABLE 4 


Rotated Factor Ixed 'utp 


ImlivMuUs 

Ear tot I 

Factor 11 

Factor 111 

Factor IV 

I 

.431 

-.044 

■ 55 fi 

■ 318 

1 

.687 

•J 4 S 

• 149 

-.017 

3 . 

.370 

.603 

.103 

.415 

4 

• 790 

.352 

— .080 

.041 

; 

,360 

- -°J 5 

—. 1 r* 

• 507 

l 

.46I 

“.091 

.316 

.699 

7 

.op 

• 770 

.098 

■327 

B 

• 30 ? 

-.051 

■ 594 

.047 

9 

-.085 

.601 

.231 

.106 

JO 

■ 55 * 

.249 

,070 

-.074 

it 

.0C9 

■ *33 

• 5°7 

.244 

12 

“•,017 

.073 

,li 8 

■545 

LI 

■346 

,001 

.136 

.471 

>4 

.393 

■559 

.112 

.252 

is 

■ 357 

.216 

,007 

.510 

l 5 


•346 

.007 

.289 

17 

■“,033 

.026 

.500 

.313 

18 

• 547 

• 4 >i 

•vs 

.609 

19 

.143 

.6i 4 

-.059 

.264 

20 

.249 

.561 

■ 33 ° 

-.113 


obtained the product of the factor loading of the individual by 
the rating given by the individual to a particular object. From 
these scores for the various objects in the aesthetics study re¬ 
ported by these investigators, it was possible to arrange the 
objects according to the magnitude of these scores for the vari¬ 
ous factors, and, thus, to name the underlying variables. 

In order to determine the nature of the factors extracted in 
our investigation, biserial correlations were obtained, for each 
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item between whether or not the items were passed by the vari¬ 
ous individuals, and the factor loadings of these individuals. A 
perfect correlation, then, would be one in which all individuals 
who missed a particular item also obtained the highest factor 
loadings. A perfect correlation of an item with the loadings of a 
particular factor would mean that it measured individual dif¬ 
ferences maximally in regard to that particular dimension 5 . 
That is, it would differentiate, most efficiently, those individuals 
with high factor loadings from those with low factor loadings. 
Groups of items which differentiated maximally were used as 
the basis for naming the factors; that is, we selected clusters of 
items with the highest correlations and observed the common 
element among them. 

Identification of factors ,—In the description of the items be¬ 
low, a positive correlation indicates that the item tended to be 
passed by those individuals with low factor loadings but missed 
by those individuals with high factor loadings. In the case of 
negative correlations, the converse is true. Since our factors 
represent areas of misconception, the positively correlated items 
are most useful as descriptive of false belief, while the negatively 
correlated items are most useful when contrasted to these. As a 
convenience, the biserial correlation of an item, together with 
its scoring key and its level of difficulty as indicated by the 
number of individuals missing the item, will be presented for 
each statement which is quoted. 

Factor I. —This seems to be a factor of general psychological 
na'ivite. It indicates a lack of technical knowledge about psy¬ 
chology. Those items which describe the factor most clearly are: 

"The printing on this page is upside-down on your retina.” 
(true) r = +.88 (5 missed) 

J In the Q technique, the factor loading of an individual does not represent the 
amount of a certain factor present, if the concept of "amount" is defined m terms of 
expected scores from factorial^ pure tests. The reason for this is that the Q technique 
assumes that the means (in this case of misconceptions) arc equal to zero and that the 
variabilities are equal to one for all individuals, The squared factor loading in the Q 
technique represents that portion of the variance in the misconceptions which the in¬ 
dividual has which correlates with the various factors. If then we wanted to know “how 
much” (os defined by the subsequent scores on a hypothetical "pure factor" test of 
this dimension), we would have to adjust the squared factor loadings for the amount 
of misconception, ns indicated by their total scores, and for the variability of the in¬ 
dividual’s scores. This adjustment of varianceswas not carried out in this particular 
study, although it should be in subsequent studies of this type. It was felt that the in¬ 
dividual differences in the means and variabilities were not sufficiently great to neces¬ 
sitate a reworking of the data. 
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“Rats, cats, and dogs have the power to reason.” (true) r - 
+ .84 t K missed) 

"There is little that psychology can do for the normal person ” 
(false) r -> +.81 lx missed) 1 

Factor //. This seems to he a "knowledge of special termi¬ 
nology" factor. The items which were missed on this factor com 
tained terms whose meanings are not clear to the layman, Items 
with high correlations arc: 

“Half the people in this country arc below average in intelli¬ 
gence." (true) r » + .98 (9 missed) 

"The unconscious mind is located just above the roof of the 
mouth, directly back of the nose.” (false) r = -f ,8a (6 missed) 

It will he noticed that both of these statements require special 
knowledge about the terms contained in them. The terms''aver- 
age in intelligence" and "unconscious mind,” while familiar to 
psychologists, do involve a terminology above the level to be 
expected of the layman. 

In contrast to these are the two statements which have the 
highest negative correlations: 

“Cats can see in complete darkness.” (false) r « —.81 (6 
missed) 

"A dog can sense impending disaster better than a man.” 
(false) r » ™.8i (14 missed) 

While these last two statements do require a kind of special 
knowledge, namely that pertaining to perception, there is no 
problem of terminology here. 

Factor III- This factor appears to be the clearest of the four. 
It has been labelled "conventional morality.” The items with 
the highest positive correlations are: 

"The majority of adult criminals arc feeble-minded or very 
nearly so.” (false) r «* +.88 (4 missed) 

"A child is born with a sense of good and evil—this is his con¬ 
science.” (false) r = +.80 (6 missed) 

"Being spanked may be pleasurable to a child." (true) r = 
+ .80 (7 missed) 

"A person who won’t look you in the eye is probably un¬ 
trustworthy,” (false) r =» +,78 (1 missed) 

Individuals high in this factor seem to have misconceptions 
about good and evil. They seem to look upon the conscience as 
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something which is inborn. They appear to regard “bad” be- 
havior as being more modifiable through the “will” and intel¬ 
lectual choice of the individual than factual evidence would 
justify. 

Factor IV .—This seems to indicate an “over-evaluation of 
learning ability,” particularly of children. Items with the high¬ 
est positive correlation are as follows: 

“Childien memorize much more easily than adults.” (false) 
r = +.98 (14 missed) 

“The average infant would learn to walk two months earlier 
than he does, if he were given the proper training.” (false) 
r = + .91 (it missed) 

“The sense organs of touch, in a person with normal vision, 
are just as sensitive as those in a blind person.” (true) r = +.72, 
(14 missed) 

It is interesting to note that the item "It is probable that 
man’s instinct to fight is the fundamental cause of wars.” 
(false) had a correlation of —.71. (5 missed) 

Thus, it is possible to determine the factors of a given area 
and to carry out item analyses for hundreds of items from data 
from only a relatively few subjects. We may know how well 
each item measures each factor, as well as the level of difficulty 
of each item. For these reasons, this technique is particularly 
recommended for use in relatively unexplored areas such as 
aesthetics and ethics, where the investigator is faced with the 
problem of establishing the principal dimensions from an almost 
infinitely large number of items. To construct tests in an un¬ 
explored domain is costly and time consuming, particularly 
when the investigator does not know which items to start with. 
In the method suggested in this paper the investigator starts 
with every kind of item which he thinks might measure some¬ 
thing within the domain being considered. The results give him 
a rough idea of what the basic dimensions are. He also knows 
what groups of items are the best measures of these dimensions. 
He may then start a further analysis of the area, building his 
tests in the direction of the clusters and using the clusters of 
selected items as the basis for the selection of similar kinds of 
items. 

To demonstrate this method of screening, eight items were 
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selected from the original loo. Each factor was represented bv 
two items. Each of these items had a high correlation with the 
factor it represented, but low correlations with the other factors 
Each of these items was correlated with the other items, using 
tetrachoric coefficients on the basis of the twenty individuals’ 
scores who constituted the twenty variables of the original 
matrix. This matrix, in which the items are the variables, is pre¬ 
sented in Table 5, The variables are grouped according to the 

TABLE 5 

Tntertorrelations of hems 


Items 

(Factor HI) (Factor IV) (Factor II) (Factor I) 

1 2 3 4 5 e 7 a 


(ill) 

a 

+ •75 






3 

-.08 

+ .10 





(IV) 

4 

—.ai 

—, 18 

+ •55 




(11) 

l 

— .to 
"•35 

+ .08 
-.10 

-■45 

-■75 

-•13 

— .a a 

+•75 

“.36 

(1) 

7 

— .00 

“•45 

— *35 

.00 

— .16 

8 

-all 

.00 

+ .00 

+ .40 

-.30 

-.85 +.58 

TABLE 6 


Centroid Factor Loadings (For Items) 


lum 

Factor I 

Factor II 

1 

.665 

-•753 

a 

3 

■394 

.8:3 

— .604 
• 335 

4 

•*55 

•356 

1 


— .I64 

-.486 

l 

— .aia 

4678 

.491 

.695 


factors which they represent. It is interesting to note at this 
point that variable 8 is the only one which has a significantly 
high positive correlation with any factor other than the one it 
was selected to represent (factor I). It is also of interest to know 
that this item has a positive biserial correlation with factor IV, 
while the two items representing factor IV have positive biserial 
correlations with factor I which are considerably above average. 

Thurstone’s centroid method of factor analysis was then 
used to extract two centroid factors (Table 6). The fact that 
only twenty cases were used in the calculations 0/ the tetra- 
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choric correlations of the matrix placed a limit upon the number 
of factors that might have been legitimately extracted without 
going below the threshold of error variance. It is of interest to 
note, however, that the pattern on the axes of the two extracted 
factors consisted of four clusters of items. With the exception of 



Fig. I, Projections of Factor Loadings upon Centroid and Rotated Axes. 


variable 8, each variable is found with the other member of the 
pair representative of each factor. Variable 8 had significant 
loadings on both factor I and factor IV. The projections on the 
two rotated axes are shown in Figure I. 

Summary .—’This study was undertaken primarily as a dem¬ 
onstration of methodology, although the factors obtained have 
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a pedagogical utility. The inverted factor technique was em- 
ployed so that the extracted factor loadings represented scale 
values for individuals in regard to these factors, The various 
items were then correlated with the factor loadings, and the 
factors were described by those clusters of items which had the 
highest correlations, The factors which emerged were fairly 
clear cut. Two items were then selected for each dimension 
which were highly saturated in that factor. When these eight 
items were factor analyzed, a pattern of four clusters in two 
dimensions emerged, 

It is suggested that this type of approach be used in un¬ 
structured domains in order to obtain a rough idea of the kinds 
of items which might be used for the further factorial investiga¬ 
tion of such areas, 


REFERENCES 

1. Garrett, II. K. and Fisher, T, F. "Prevalence, of Certain Popular 

Misconceptions." Journal of Applied Psychology, X (1916), 

410420. . , 

2, Guilford, J. V. and Holley, J. W. "A Factorial Approach to the 

Analysis of Variances in Esthetic Judgments." Journal oj 
Experimental Psycholop, XXXIX 694.9), 20H18. 
i Stephenson, W, "The Inverted Factor lechnnme." British Journal 
of Psychology, XXVI (1935-36), 344-361. 

4. Thurstone, 1 . L Multiple Factor Analysis , Chicago: Umv. of Chi¬ 

5. Valentine, W. I, "Common Misconceptions of College Students. 

Journal of Applied Psychobp , XX (1926), 631-658. 

6. 'Zimmerman, \V. “A Simple Method of Orthogonal Rotation of 
Axes." Psyckomclrika , Xu (1946), 



OPINION AND ACTION: A STUDY IN VALIDITY OF 
ATTITUDE MEASUREMENT 


C, ROBERT PACE 
Syracuse University 

The relationship between opinion and action is a practical 
topic which has rather basic theoretical importance as well. 

Opinion measurement has been attempted by a variety of 
scientists rather than by a concentration of talent in any single 
discipline. Thus, we find different techniques employed in public 
opinion polls, market research, studies of morale, management 
and job satisfaction, and in education. Political scientists, soci¬ 
ologists, social, clinical, personnel, and educational psycholo¬ 
gists, specialists in educational research, and specialists in 
measurement and evaluation have all made some contribution. 
While this diversity of approach may be advantageous, it is 
equally likely that some confusion and superficiality have re¬ 
sulted Ample documentation of the latter was given in Mc- 
Nemar’s (i) critical review of attitude-opinion methodology 
three years ago. McNcmar also stated that relatively few va¬ 
lidity studies had been made of attitude and opinion measuring 
instruments. 

Most definitions of attitude accept the proposition that an 
attitude is a tendency to act for or against some object or value. 
Most definitions of psychology describe psychology as the sci¬ 
ence of behavior, or concerned with the prediction and control 
of behavior. Advances in science are related to the precision of 
scientific measuring instruments; the value of a measuring in¬ 
strument is determined in large part by what you can do with 
the result obtained from it; and what you can do with the result 
depends on what relationships are known to exist between it 
and other variables. Thus, the value of an IQ resides largely in 
the fact that, having it, you can predict a person’s behavior or 
status in quite a variety of circumstances. Likewise, the value 
of an attitude measurement is largely dependent on knowing 



412 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

what behavior is associated with it. Opinions are the verbalized 
expression of attitudes; opinions are not action. But, certainly 
some opinions should he correlated with action, just as some 
aspects of college achievement should be correlated with scores 
on a college aptitude test. 

An opportunity to analyse some data bearing on this problem 
of opinion validity was provided by the replies of some 2,500 
Syracuse University alumni to a sixteen-page questionnaire. 
(2) (3) This alumni follow-up study was an attempt to describe 
our educational product rather fully, examining his behavior 
with respect to some of the major objectives of general educa¬ 
tion in science, social science, and the humanities. The question¬ 
naire included seven Activity Scales of eleven items each, la¬ 
belled Politics, Civic Affairs, Religion, Art, Music, Literature, 
and Science. The subjects checked each activity they had en¬ 
gaged in during the past year. The scales have the property of 
Guttman-type scales in that participation in the more difficult 
activities tends to subsume participation in the easier and more 
common activities. The score on each scale was simply the num¬ 
ber of activities checked. Then we had nine Opinion Scales of 
six items each, labelled Politics, Civic Relations, Government, 
the World, Philosophy, Art, Music, Literature, and Science. 
The statements in the opinion scales were written to reflect 
basic concepts, insights, or appreciations which are among the 
objectives of general education. Each statement was answered 
on a five-point scale, from Strongly Agree to Strongly Disagree. 
Faculty experts in the fields sampled by the opinion scales 
tended to agree among themselves in their responses to the 
items and so it was possible to score each scale simply by count¬ 
ing the number of statements on which one’s opinion agreed with 
the opinions of the experts. With only two exceptions, for every 
statement included in the scales the degree of concensus among 
answers of the experts exceeded 2 to 1, and for 80 per cent of the 
items the ratio exceeded 4 to t, In another section of the ques¬ 
tionnaire, we had a list of eighteen objectives of general educa¬ 
tion, which the alumni rated on a five-point scale of importance, 
from “very important’’ to “of no importance.” These ratings, 
of course, are also measures of opinion, 

Before reporting correlations between attitudes and activi- 
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ties it is appropriate to note the reliability of the scales and the 
items for this obviously affects the size of any correlation be¬ 
tween them. Six months after our sample of 2,500 had filled out 
the questionnaire (this represented a 50 per cent return from 
those who had received it) we sent a second copy of the ques¬ 
tionnaire to a small group of iao, receiving 68 in return. The 
test-retest consistency of scores over this six-months interval 
was computed, using Pearson product-moment correlations. 
For the Activity Scales, these ranged from .70 to .89 with a 
median r of .83. For the nine Opinion Scales, the median cor¬ 
relation was .65, with seven falling between .60 and .70, and 
two very low ones—.40 and .31. Then we also checked the con¬ 
sistency of responses item by item. For the Activity Scales the 

TABLE 1 

Correlations Between Scores on Activity Scales and Scores on Opinion Scales 
(N = c 2300) 


Scales Correlation 


Political Activity Score vs Political Opinion Score.15 

Civic Activity Score us Civic Opinion Score. .... .01 

Religious Activity Score os Philosophy Opinion Score.29 

Art Activity Score or Art Opinion Score .. . . .37 

Music Activity Score vs Music Opinion Score. ....40 

Literature Activity Score vs Literature Opinion Score.33 

Science Activity Score vs Science Opinion Score.14 


average per cent of identical responses was 85, with a range 
from 83 to 87. For the Opinion Scales the average per cent of 
identical responses was 75 with a range from 68 to 84. 

Correlations between activity and opinion scores are listed in 
Table 1. The Political Opinion Scale was designed to measure 
one’s belief in the value and importance of individual and group 
participation in a representative government. The Political 
Activity Scale is, presumably, a measure of the extent of partici¬ 
pation in various political processes, such as discussing and read¬ 
ing about political matters, voting, writing letters, signing peti¬ 
tions, giving and collecting money, etc. One might expect the 
correlation between two such scales to be considerably higher 
than .15, The Civic Opinion Scale was intended as a measure of 
tolerance and acceptance of equality of opportunity for all 
people. The Civic Activity Scale was intended as a measure of 
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community participation. The Philosophy Opinion Scale is con¬ 
cerned with acceptance of a Christian and ethical set of values 
The Religions Activity Scale is concerned mainly with partici¬ 
pation in church-related activities. The opinion scales in Art 
Music, and Literature were intended to measure the general 
sophistication and maturity of understanding in art, music 
and literature. The corresponding activity scales were designed 
to reveal the frequency and depth of engagement in activities 
related to art, music, anti literature. 

All these correlations are small, ranging from a low of .01 to a 
high of ,40. We did not construct the scales with the sole thought 
of correlation between activities and opinions, although we cer¬ 
tainly hoped t hat the people whose opinions reflected the great¬ 
est insight and understanding in the various fields would tend 
also to he most active in those fields. This seems to be true to a 
limited degree in art, music, literature, and religion, but practi¬ 
cally non-existent in politics, civic affairs, and science. 

Some of the individual opinion items can appropriately be 
paired with a corresponding activity item; for other opinions 
and activities it seemed less reasonable to expect any corre¬ 
spondence. Looking through the questionnaire, I selected 27 
opinion statements against which it seemed plausible to com¬ 
pare one or more of 39 activity items, Altogether I had 188 
pairs of activity and opinion. For a simple calculation of re¬ 
lationship, I used Thurstonc’s tables for estimating tetrachonc 
correlation coefficients. Seventy correlations have been com¬ 
puted and they are the ones which seemed most likely to show 
some correspondence between opinion and action. 

A distribution of the 70 correlations shows a median value of 
,18, with a fourth at .07 or below, and another fourth at .30 and 
higher. The lowest was —.05, The highest was +.54. 

All of the correlations above .30 came from the fields of art, 
music and religion; none came from politics, civic affairs, or 
science. Literature was not included in these comparisons. 

Selected examples of these correlations are shown in Table a. 
It is clear that participation in various church-related and other 
religious activities is definitely correlated with having a favor¬ 
able opinion toward the significance and importance of religion; 
but these activities are less dearly related to more general opin- 
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TABLE a 

Correlations Between Specijic Opinions and Specific Actions 


Opinions 

PHILOSOPHY and RELIGION 
Disagree with the statement that. 
Religion has little to offer intel¬ 
ligent and scientific people to¬ 
day 


Rate very important as objective of 
college education. Understand¬ 
ing the meaning and values of life 

AR.T and MUSIC 
Disagree with the statement that' 
Modern painting—impression¬ 
ism, expressionism, cubism, sur¬ 
realism, and the rest—is mostly 
the work of crackpots. 


Rate very important or important 
as objective of college education: 
Developing an understanding and 
enjoyment of art and music 


Disagree with the statement that: 
The tendency of some modern 
composers to use strange harmo¬ 
nies and discords makes for poor 
music 


Disagree with the statement that: 
There has been little or no out¬ 
standing music composed in the 
20 th Century 


Agree with the statement that: 
Radio should give people much 
more opportunity to hear good 
serious music 


POLITICS 

Disagree with the statement that: 
Sending letters and telegrams to 
congressmen has little influence 
on legislation 


Correlation 


I belonged to a church 
I contributed a regular sum of 
money to a church 
I served on some volunteer 
church committee 
I prayed 

I read selections from my Bible 

1 belonged to a church 
1 contributed... 

I prayed 
1 read . . . Bible 

I visited an art gallery or mu¬ 
seum 

I attended an exhibition of con¬ 
temporary painting 
I read one or more books about 
art, artists, or art history 

I visited an art gallery . , 

I attended an exhibition of con¬ 
temporary painting 
I rend one or more books about 
art... 

I listened to some serious music 
by contemporary composers 
I listened to symphony programs 
on my radio at least once a 
month 

I rend one or more books about 
music, musicians, or music his¬ 
tory 

I listened to .. . serious contem¬ 
porary music.. . 

' I listened to symphony progrnms 

I read .. .books about music .. . 

I listened to .. . serious contem¬ 
porary music. . . 

I listened to symphony programs 

I read . . . books about music .. . 

I listened to... serious contem¬ 
porary music.. . 

1 listened to symphony programs 

1 subscribed to some orchestral or 
musical concert series 


fl wrote a letter or sent a telegram 
\ to a public official 
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TABLE 

a— Continued 



Actions 

Correlation 

POLITICS UwiimitJ) 

1 

Agree with rtutsincnt that. j 

Pressure greup are ini ful arid! J 

important features of democratic , 

government ; 

I wrote a letter or sene a telegram 
to a public official 

1 upuid ,1 petition for or ngainst 
rotne legislation 

1 contributed money to some po- 
, lineal cause or group 

■ ID 

06 

.10 

Rate my important as objective ■ 

ofcollcge education: I low to par- j 

tkipate effectively as a cititen ' 

! I voted in the last primary or lo- 
j cal election 

11 -igned a petition ... 

11 wrote a letter or telegram 
[l contributed money.., 

■03 

-15 

.IQ 

.05 

Rate very important as objective 
of college education • t ’ndertand- 
ittg world Units and presang so- < 
ri.il, political, and economic prob¬ 
lems 

I li tenet! at least once a month 
to speeches and discussion pro¬ 
grams on the radio dealing with 
national and international prob¬ 
lems 

I read one or more books about 
politics 

.26 

.18 


ions about the importance of understanding the meaning and 
values in life. Opinions about art and music which reflect a 
sophisticated and mature understanding and interest tend to be 
accompanied by participation in various art and music activi¬ 
ties. In the field of politics, on the other hand, the relations be¬ 
tween opinion and action approach zero. 

An interesting phenomenon occurs in many of these com¬ 
parisons between specific opinions and specific actions. People 
who hold their opinions "strongly” tend to engage in the related 
activities whether it makes sense or not. For example, among 
those who feel strongly that sending letters and telegrams has 
some influence, 37 per cent wrote a letter or sent a telegram. 
Among those who agree that it has some influence but do not 
feel strongly about it, 23 per cent wrote a letter or sent a tele¬ 
gram. Among those who had no opinion one way or the other, 
10 per cent engaged in the activity. Then, among those who 
tended to think it had little influence, 18 per cent did it anyway; 
and among people who were convinced it had little influence, 
33 per cent engaged in the activity. Another example: among 
people who feel strongly that religion does have something to 
offer intelligent people today, 86 per cent belonged to a church 
and 76 per cent contributed a regular sum of money to a church. 
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Among those who agree, but not strongly, that religion is worth¬ 
while today, 72 per cent are church members and 56 per cent 
contribute money to the church. Among those who have no 
opinion one way or another, 41 per cent belong to a church and 
33 per cent contribute money. But, among those who are con¬ 
vinced that religion has little to offer, 54 per cent belong to a 
church and 47 per cent contribute money regularly to it. For 
the activity “I prayed,” the percentages drop from 86 1034 and 
then rise to 54. 

In general, opinions regarding the importance of the various 
goals of higher education do not exhibit this U-shaped curve 
in relation to participation in the corresponding activities. For 
example, among people who rated “Understanding world issues 
and pressing social, political, and economic problems” as “very 
important,” 82 per cent listened to radio speeches and dis¬ 
cussions at least once a month. Among those who rated it as 
“important,” 72 per cent listened. Among people who thought 
it was “of some importance,” 64 per cent listened, and among 
people who thought it was of little or no importance, 50 per 
cent listened. With respect to “I read one or more books about 
politics” the corresponding percentages were 27, 15, 9, and 
zero. 

Or take an illustration from Art. Forty-seven per cent of the 
people who rated “Developing an understanding and enjoy¬ 
ment of art and music” as “very important” said they had at¬ 
tended an exhibition of contemporary painting. Only 6 per cent 
of those who considered this to be of little or no importance 
had attended such an exhibition. Also, in music activities, of 
those who rated the objective very important, 84 per cent 
listened to radio symphonies at least once a month in contrast 
to 46 per cent among those who regarded the objective as of 
little or no importance. 

What conclusions can we draw from these figures? There 
seems to be some correlation, generally in the .20’s and .30’s, 
between belief in the importance of some field and participa¬ 
tion in activities in that field. This was true of Art and Music, 
and to a lesser extent of Religion and Politics. It is also true of 
science, although I have not reported those correlations. There 
seems to be a reasonable correlation between specific opinions 
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and specific actions in Art, Music, and Religion—again gen¬ 
erally in the 30's and 40*3. In politics, however, I found no corre¬ 
lation higher than 4 - .15 between a specific opinion and a spe¬ 
cific action which might be expected to be associated with it. 

Many of the correlations in this study may be thought of as 
rather high. This is so if one considers the probability that 
there may be an additive or reinforcing effect among related 
opinions and the further probability that such factors as op¬ 
portunity for action, multiple actions, and variations in in¬ 
tensity of opinion all may serve to depress the size of correla¬ 
tions between single opinions and single actions. Moreover, 
tetrachoric coefficients tend to be lower than Pearson product- 
moment coefficients. The present study is primarily exploratory 
rather than analytical; it reports relationships in a wide range 
of fields based on data designed broadly to throw light on the 
status of the educational product rather than data specifically 
collected to analyze relationships between opinions and actions. 
Yet so limited is our knowledge of the validity of many opinion 
measurements that one of our basic needs is to collect all the 
information we can from whatever sources so that ultimately 
critical analysis and theory can be more soundly attempted. 

After the failure of the public opinion polls to predict the 1948 
Presidential election, attention was focused anew on the re¬ 
lationship between expressed opinion and behavior. The poll¬ 
sters were quick to claim that their failure in the election had 
no bearing at all on the value of their regular reports describing 
the public’s attitude on a great variety of complex issues such 
as labor relations, internationalism, European reconstruction, 
relations with Russia, etc. The fact is, however, that there is 
little or no published evidence of the relationship between such 
attitudes and behavior. Until we have more evidence of the re¬ 
lation between opinion and action, we must regard many of the 
opinion polls and attitude surveys in the same way that we re¬ 
gard most other magazine and newspaper reports—namely, as 
interesting observations to be treated with a critical open- 
mindedness. 

Advances in the science of attitude measurement will come 
in proportion to our ability to establish clear relationships be¬ 
tween opinion and action. Until we do this, our so-called meas- 
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urements will remain purely descriptive. What we must seek is 
measurement that is both descriptive and predictive of ob- 
seivable behavior. 
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ESTIMATING INTELLIGENCE BY INTERVIEW 

JOSEPH V. HANNA 
New York University 

The interview had, until recently, been too long neglected 
among psychologists as yielding promising materials for re¬ 
search, This neglect, in the writer’s opinion, is due to two main 
causes. In the first place, several early and too sketchy ex¬ 
periments yielded results which tended to establish that inter¬ 
viewing techniques and methods were not sufficiently valid to 
be taken seriously (5, 8, 14). The results of these studies were 
widely quoted by influential writers, and undoubtedly had the 
effect of restraining younger clinical and applied psychologists 
from initiating research projects aiming at the appraisal of 
interviewing methods and skills. It is a strange paradox that, 
at the same time, interviewing was nevertheless accepted among 
psychologists as necessary, and many handbooks and manuals 
dealing with “acceptable” practices in interviewing were widely 
used. 

A second major reason for the neglect of careful studies of 
interviewing stems out of the rapid development and use of 
aptitude tests. Why struggle with a large number of variables 
in intricate and baffling combination, when a single test which 
yielded a measurable correlation with a criterion, could be em¬ 
ployed? Individuals were selected for specific jobs on the basis 
of test scores. Intelligence tests were used widely in appraising 
academic capacity. Yet responsible techniques for dealing with 
the total person were too frequently absent, 

The last few years prior to World War II had witnessed the 
emergence of a keen interest in a more careful analysis and 
improvement of interview techniques and skills. Several par¬ 
tially independent efforts contributed to this revival. Greater 
care was exercised in the interviewing of applicants for em¬ 
ployment, and there was developed a more standardized frame¬ 
work for the interview (7). The use of interviewing in adver- 
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rising research and opinion polling invited more critical 
attention to such aspects as level of diction, form of the ques¬ 
tion, and the like. These more objective methods tended to 
inject themselves into interviewing practice in the areas of 
clinical and abnormal psychology, vocational counseling, and 
other fields, Occasional books appeared which epitomized the 
best research results and applied efforts to interviewing (l, ia). 
During World War II such instruments as biographical records, 
raring scales, and careful interview procedures made an im¬ 
pressive contribution to methods of appraising personnel (10). 
All of these efforts have grown out of the feeling that as valuable 
as testing is, it is not enough, and that such methods must be 
supplemented by techniques and procedures which deal with 
the total person. 

The study here reported has to do with the use of interview 
procedures in estimating the intelligence of clients seeking vo¬ 
cational counsel. By “intelligence” is meant that capacity which 
is measured more or less accurately by the usual test of in¬ 
telligence. While the information available to the writer had 
bearing on a rather wide lange of adjustments the information 
synthesized in the process of the interview is drawn upon only 
to the extent of indicating the client’s cleverness, alcitness, or 
capacity usually referred to as general intelligence. 

Procedure 

Fifty-four subjects, 50 men and 4 women, were used in the 
study. They were drawn from applicants to the counseling 
service of which the writer was in charge, for assistance in 
deciding what occupation to enter, in choosing appropriate 
courses of study, and related problems 1 . The subjects were 
taken in order of application, no specifications being made as 
to age, sex, or other qualities. Care was exercised, however, to 
eliminate from the sampling all subjects who were introduced 
to the writer in such a way as to give any indication of back¬ 
ground, nature of problem, or abilities and limitations. Those 
subjects with whom the writer had contacts prior to the pre- 

’The Personal Counseling Service, West Side Y.M.C.A., New York City, The 
study was completed shortly before the United States became Involved in World 
War II. 
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liminary interview were also excluded from the sampling. The 
estimates of intelligence were based solely on the information 
secured from the subject, independently of any informal or 
official reports from other sources. Within the period covered 
by the study, about 40 per cent of the applicants for the coun¬ 
seling service were eliminated from the sampling due to such 
prior information and reports. 

The subjects were remarkably heterogeneous. They ranged 
in age from 16 to 44, with a modal age of 17, an average age of 
25.9, and a median age of 24.9. Education varied from no 
formal grade completed to status in graduate and professional 
school, the average grade completed being 11.6. Intelligence, 
as measured later, varied from a percentile rank of 2 to 99 
plus. The group of fifty men and four women included several 
refugees from European countries as the result of Nazi perse¬ 
cution. 

One of the requirements of the counseling service was that 
the client fill out Aids to the Vocational Interview , an eight- 
page blank published by the Psychological Corporation. This 
blank provided space for a fairly comprehensive recording of 
the client’s family background, educational, vocational, and 
avocational interests and experiences, self-estimates of abilities 
and the like. It was usually filled out by the client following 
the preliminary interview. For the clients dealt with in the 
study, however, the blank was filled out prior to the pre¬ 
liminary interview. The interview required from 20 to 35 
minutes. The estimate of intelligence was in all instances limited 
to the impressions obtained from the subject in the process of 
the interview. The filled-in Aids was helpful, especially, in 
reducing the time which would have otherwise been required 
for each interview. Following the interview with each subject 
the estimate of intelligence was made in terms of a fancied 
percentile score such as the client would be expected to make 
on a test of intelligence suitable for entering college freshmen, 
and in competition with such a selected group. This procedure 
was decided upon for the sake of uniformity, irrespective of 
the subject’s age or educational background. ■ • 1 c 

The estimate of intelligence was based on the principle of 
internal consistency, it being assumed that from a reasonably 
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wide range of cues and impressions there would emerge a con¬ 
stellation or cluster of such items, each item of which is “valid” 
by agreement with the others. Irrelevant or misleading cues, 
not being typical of the trend, would be rejected as invalid 
(j n ). It should be held in mmd that any single item of 
information offered by the subject, or any single impression of 
the counselor may or may not be a valid cue. The test of its 
validity is whether or not it fits in with other cues secured 
from a variety of sources and directions If so, then it may be 
assumed to be valid. It is obvious, however, that the validation 
of any such cue places a burden upon the interviewer to tap a 
sufficiently wide area of the subject’s background and present 
status as to reduce to a minimum the chances of error in 
judgment. If the exploration is too limited in scope any one 
cue may be weighted unduly, leading to eironeous appraisal of 
the trait or quality being estimated. Such errors undoubtedly 
contributed to errors of estimate to be reported later in the 
present study. 

The writer will not attempt to offer a complete list of cues 
utilized in estimating intelligence. To do so would he impossible 
due to the subtlety or obscurity of certain cues and relationships 
synthesized on the basis of overall, intuitive judgment. A 
listing of the more important and obvious cues, however, may 
be helpful: (1) subject’s report of school grades earned; (a) 
subject’s reported membership in honor clubs and societies; (3) 
subject’s reported standing in school class; (4) reported dis¬ 
tinctions and achievement outside of school; (5) reported leader¬ 
ship ability; (6) certain hobbies and activities such as chess, 
bridge, athletic activities, etc.; (7) conversational ability, use 
of words, etc.; (8) extent and nature of materials read; (9) 
activities obviously of compensating nature; (10) range of ac¬ 
tivities,—varied, or limited; (11) manner and style of re¬ 
sponding to questionnaire items; (ia) spelling ability; (13) age 
in relation to grade completed in school,—over-age, accelerated, 
etc. The following constellation of cues, for example, would 
point to high intelligence; membership in school honor society, 
reported high-school average of 95, discriminating use of words 
in conversation, more interested in English, mathematics, physi¬ 
cal sciences and foreign languages, than in the more general 
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subjects, enjoyment of chess as a hobby, reading of sophisti¬ 
cated books and periodicals. The following constellation would 
point to limited intelligence; just average grades, "not much of 
a student," narrow range of vocabulary and lack of discrimi¬ 
nating choice of words in conversation, habitual reading of 
tabloids and popular periodicals, more interested in general 
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subjects such as history than in the more exacting subjects, 
unusual emphasis of physical activities, over-identification with 
limited hobby. A highly intelligent individual may prefer to 
read tabloids to other newspapers. The individual with mediocie 
or low intelligence may unconsciously or otherwise exaggerate 
his school standing even to the point of indicating honor society 
membership. Such erroneous cues generally do not fit into the 
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constellation which seems generally typical of the individual, 
and can be discarded as invalid.. 

One further explanation should be made here. The writer 
made no attempt to weigh or evaluate each cue separately as 
has been done occasionally in the scoring of interview forms, 
and application blanks (3, 6, 13, 15). He rather trusted to his 
judgment to sense the less tangible relationships along with the 
more obvious cues in arriving at his final estimate. 

Following the interview and estimate, all subjects were given 
a battery of tests including two tests of intelligence,— the 
American Psychological Examination /or College Freshmen , and 
the Ohio State University Psychological Test. The first is a time 
limit and the second a work limit, or power test. Estimated 
and actual percentile scores and errors of estimate for the 54 
subjects are given in Table 1. Distributions of actual percentile 
scores on the A. C. E. and Ohio, and of estimated percentiles, 
show the population to be of considerably above-average intel¬ 
ligence. However, the subjects used in the present study are 
rather typical of clients in general who, throughout the years, 
applied for counseling to the Personal Counseling Service. All 
previous studies made of the counselee clientele show above 
average distributions of intelligence (4). 

Results 

/ The actual percentile scores on each test were correlated with 
the estimated percentiles of intelligence and the two tests were 
correlated with each other, by the Pearsonian product-moment 
formula. The following correlations were obtained: A, C, E. 
with estimates, r = .71; Ohio with estimates, r = .66; A. C. E. 
with Ohio, r = .77. It will be observed that agreement between 
estimated percentile scores and scores on each of the two tests 
of intelligence is just slightly lower than the correlation be¬ 
tween the tests. This poses an interesting question as to which 
of the two instruments or techniques would be the more valid 
in predicting educational or other achievement. 
fljThe results will be examined briefly for the purpose of identi¬ 
fying, if possible, any errors which may account for the de¬ 
viation of estimates from actual scores. Both tests of intelligence 



4^6 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

were taken by 45 of the 54 subjects. For these the average of 
the two test scores was taken as a basis for comparison with 
estimated scores. The difference between the estimated score 
and the average of the test scores, is designated “overesti¬ 
mation" or "underestimation.” Examination of Table 1 will 
show that 25 subjects were overestimated, and 14 were under¬ 
estimated by a margin of 10 or more percentile points. The 
average error of overestimation was 25.8 and of underesti¬ 
mation, I4.4 percentile points. The highest error of estimation 
was 49 percentile points, one subject being overestimated by 
this margin. One subject was underestimated by a margin of 
44 percentile points. For 25, almost half the subjects, however, 
the error of estimation was 10 or less percentile points. 

For those subjects for whom the error of estimate was ten or 
more percentile points, a study of the records and such notes 
as had been made following the interview was made with the 
hope of identifying the factors responsible for the deviation. It 
is obvious that such an examination cannot be wholly objective. 
A preliminary inspection, however, bad indicated unmistakably 
the presence of at least one such factor. Examination of data 
in columns 7 and 8, Table 1, shows clearly the tendency to 
overtimate the intelligence of younger subjects, Of the 19 sub¬ 
jects eighteen or below, 10 were overestimated by ten or more 
points, whereas only 3 were underestimated by this margin. 
Of the 35 who were nineteen or above, only 5 were overesti¬ 
mated by ten or more points, whereas 11 were underestimated 
by this margin. The tendency to underestimate the intelligence 
of older subjects, however, is not as clear as the tendency to 
overestimate the intelligence of younger clients. 

The further examination of the filled-in Aids, in addition to 
casting light on the importance of the age factor also indicated 
roughly several additional factors which seem to have bearing 
on errors of estimation. These items, impressions, etc., were 
summarized and appear in Tables a and 3. Several Items in 
Table 1 show higher frequency among those overestimated, and 
in Table 3 for those underestimated. 

Reports by subjects of scholarship standing as indicated by 
grades, position in class, and the like, is apparently the most 
important single source of errors of estimation, there being a 
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tendency to rate individuals higher who reported scholarship 
standing in or near the upper quarter of their class, and a 
corresponding tendency to rate those lower who reported 
scholarship below average. Other characteristics of those under¬ 
estimated were taciturnity, evidence of mediocre reading habits, 
the early selection of specializing courses such as shop work, 

TABLE 1 

Characteristics of Subjects Overestimated by Ten or More Percentile Scores 


No. 


Reported outstanding specific aptitudes. 9 

(mathematics, technical, music, art, etc.) 

Reported high scholarship, regents grades, honors, etc . 7 

Conversational ability, easy flow of words. . . . 3 

Good habits of application. a 

Good looks ... • . a 

Miscellaneous . 6 

(One frequency each for the following characteristics' Well dressed, 
Practical judgment, Good vocational _ adjustment, Self-assurance, 
Foreigner,*—language difficulty, Physical handicap due to birth 
injury*) 

♦In instances such as this it is doubtful if test scores indicate actual level of in¬ 
telligence 


TABLE 3 

Characteristics of Subjects Underestimated by Ten or More Percentile Scores 



No. 



Taciturn, uncommunicative. 

Mediocre leading habits . 

Early specializing courses.. .. 

Emotionally maladjusted . . 

7 

5 

5 

4 

3 

a 



6 

(One frequency each for the following characteristics' Poor speller, 
frequent school absences, poor study habits, marked facial asym¬ 
metry, dull appearance, extreme dependence on others) 



typing and the like, and emotional maladjustment; and of those 
overestimated, good conversational ability, appearance, and 
positive traits of personality, In a good many cases inflated, 
sketchy or too modest reports were corrected on the basis of 
additional items of contra-information, It can readily be seen, 
however, that a paucity of such “rounding out” information 
might lead to the acceptance at face value, of questionable 
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information as fact, resulting in mistakes in estimates, Had the 
writer been more thorough and searching in his interviewing 
some of the errors could probably have been avoided or reduced, 

Summary of Results 

t. It is possible to estimate intelligence test scores with con¬ 
siderable validity on the contents of the interview, Correlations 
between estimates and test scores are .71 and .66, just slightly 
lower than the correlation between the two tests, ,77. 

4, There was a tendency to overestimate the intelligence of 
younger subjects, and to a lesser extent to underestimate the 
intelligence of older clients. 

3. Underestimation and overcstimation of intelligence seem 
to be related also to reported achievement, reported specific 
aptitude, negative and positive personality qualities, habits of 
application, and the like. 

Discussion of Results 

While it seems clearly possible to estimate intelligence, ability 
to learn, etc., by interview, it also seems unmistakably clear 
that the validity of the estimates will depend on two general 
factors or conditions. First, there must be available a sufficient 
range of reported information, together with reasonably ade¬ 
quate facilities for interviewing. Second, the experience, com¬ 
petency and skill of the interviewer would seem to be a primary 
requisite for the validity of estimates. 

The relative values of estimates and actual test scores require 
discussion of a further possibility. Heretofore in the present 
discussion the differences between actual and estimated scores 
have been referred to as “errors of estimation” on the tradi¬ 
tional assumption that actual test scores should be the more 
valid in predicting scholastic and related types of achievement. 
It is obvious, however, that in the absence of objective vali¬ 
dation of either estimates or tests for the group of subjects here 
studied, the relative validities of estimates and tests can only 
be a matter of conjecture. It seems appropriate to postulate 
that In dealing with groups such as here reported, careful inter¬ 
viewing based on materials supplied by the individual himself 
and impressions gained from such interviewing, independently 
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of official evidence of past performance, grade transcripts, and 
so on, can be at least as valid in predicting further educational 
and related achievement as a good test of intelligence. The 
interview, operating at its highest level, however, is not offered 
as a substitute for tests of intelligence. The conclusion offered 
is a reminder to counselors that the storehouse of information 
available through systematic interviewing, a source too little 
utilized by many counselors, should not be neglected; and that 
in the appraisal of the capacities and interests of the client the 
interview based upon such experience must be regarded as an 
essential supplement to the more objective measures. In closing 
it seems appropriate to suggest that counselors in training would 
find it good practice to utilize interviewing procedures in esti¬ 
mating the intelligence of clients in advance of testing. 
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INCLUSION OF “NONE OF THESE" MAKES 
SPELLING ITEMS MORE DIFFICULT 

MARCIA BOYNTON 
U, S. Civil Service Commission 

A special study of the spelling items in its Clerk and Steno- 
grapher-fypist Examinations has been undertaken by the U. S. 
Civil Service Commission. All general-test items of these ex¬ 
aminations are subjected to systematic statistical evaluation, 
but further analysis is being made of this one type The purpose 
of the study is to determine what elements make for item diffi¬ 
culty, in order to establish guides for improving the control of 
difficulty in the many alternate forms of examinations required. 
The amount of information is insufficient as yet to warrant any 
conclusions. However, a few findings are emerging. 

An indication of the value of the alternative “none of these” 
is one of the preliminary findings. Each of the spelling items has 
three alternative spellings of a single word, with “none of these” 
as a fourth alternative. The competitor is instructed to select 
the correct spelling, if any, or to select the fourth alternative. 

Although an item type with only four alternatives is not so 
desirable as one with five, so few words lend themselves to a 
sufficient variety of plausible misspellings that the use of five 
choices was not undertaken, It is recognized that the use of 
various misspellings is undesirable for the further reason that it 
emphasizes wrong instead of correct spellings. To avoid both of 
these objectionable features, each item could include four or 
five different words. The use of different words in this way, how¬ 
ever, presents too great a problem to test constructors in two 
respects. First, it exhausts the supply of suitable words too 
quickly in view of the constant need for new sets of examination 
papers. Second, it increases too greatly the number of words 
which must not appear in any of the instructions, the vocabu¬ 
lary items, the reading items, or the grammar items of the same 
test booklet. 
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The purpose of including “none of these” as an alternative 
was to increase the number nf possible alternatives) thereby 
reducing the chance that competitors' guesses will be correct. 

Analysis of competitors’ answers shows that an item that does 
not include the correct spelling is much more likely to prove 
difficult than an item in which the correct spelling appears. As 
is to be expected, an item in which there are two or more points 
of difficulty is more likely to prove difficult than an item in 
which there is only one such point. For example, in a sample 
item, “occasion,” a poor speller might wonder whether to use a 
single consonant, or whether to double both the “c” and the 
“s'*. The two findings are consistent, since a constructor would 
not be able to devise three attractive misspellings of a word un¬ 
less it contained more than one point of plausible misspelling. 



A TABLE AND AN ABAC FOR TESTING THE 
SIGNIFICANCE OF RHO 


FRANK M. DU MAS 
University of Texas 


I. Introduction 

Statisticians have developed several indices of relationship 
based on ranks. It seems necessary, therefore, to explicitly de¬ 
fine the statistical quantity with which this paper is concerned. 
This statistical quantity derives from Speaiman, it is usually 
called the coefficient of rank difference correlation, and will be 
referred to in this paper as rho or p. Rho is defined as 

6 Sd 2 

p ~ 1 N(N 2 - i) ’ 

where, Ed 2 is the sum of the squared differences between paired 
ranks; N is the number of pairs of ranks. 


II. Older Method of testing the Significance of Rho 

The older method of testing the significance of rho is to 
compute the standard error of rho, divide rho by its standard 
error, enter the normal probability table with the quotient, 
and then make a statement concerning the probability of ob¬ 
taining at least rho = o in future samples of the same size 
taken from the same population. The standard error of rho, 
<rp, is usually computed from formula (a) as follows: 


_ * .04 (i - p 2 ) 

075 Vn -1 


00 


There are at least three criticisms of this method of testing 
the significance of rho. First, formula (a) is only a rough ap¬ 
proximation of the standard error of rho. Second, the distribu¬ 
tion of rho is markedly skewed when rho is moderate or large 
and, therefore, the normal probability table should not be used. 
Third, in those instances where rho is most frequently applied 
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(say, when N < g), the sampling distributions of rho are most 
peculiar. When N » 3 or 4, they arc bimodal; when N = 

6, 7 or H, they have a serrated profile. When N > ^ the dis¬ 
tribution may be said to be unimodal and as N co t h e 
sampling distributions approach the normal distribution as the 
limit. However, in every case the sampling distributions, for 

too 1-----1-1 


10 



1 11 13 15 17 11 21 23 25 27 25 

10 12 H 14 13 20 22 24 24 23 50 


N 

Fio. I. Abac for Testing the Significance of Rho When N > 9 


f> «= o, are symmetrical. The methods that follow obviate these 
criticisms to a considerable degree. 

III. Newer Method of Testing the Significance of Rho 

The sampling distributions of rho when N > 9 1 may be 
said to be unimodal. Actually, these distributions have a saw¬ 
tooth profile which tends to smooth out as N increases and 
approach the normal distribution as the limit. We shall assume 
the population of rho for samples of N > 9 to be normally 


l This value is chosen arbitrarily; we could have chosen 8 , 10 , ii, 12 , etc. 
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distributed. Under this assumption, we may then enter stu¬ 
dent’s distribution and test the significance of rho. Kendall 
(i, p. 401) suggests these assumptions and procedures by an 

TABLE 1 


Table for Testing the Significance of rho when N < 9. Values with an Asterisk are 
J Probabilities Rather than Levels of Confidence 


N 
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example m which N equaled 10. We shall use this procedure 
when N > 9. 

Figure I is an abac to be used in testing the significance of 
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rho when N > 9. Formula {3) has been suggested by Kendall 
(l, p. 401) as appropriate for testing the significance of rho 



Since N — 2 arc the degrees of freedom, df, we may substitute 
and solve for p . When this is done we have 



It was then an easy matter to enter formula (4) with t and 
df for the various levels of confidence shown in Figure I. The 
contours in Figure I indicate changes in rho as a function of 
the sample size with the level of confidence for rejecting the 
null hypothesis as the parameter. 

Figure I may be: used in the following manner. Assume a 
sample of Nf •* 17, and p «■ .38. Entering Figure I with these 
values we find that we may reject the null hypothesis at the 
5 per cent level of confidence. 

Because of the unusual characteristics of the sampling dis¬ 
tributions of rho when N < 9, the t test of significance would 
be inappropriate. But it is precisely for the small values of N 
that a significance test is needed so badly. For example, clinical 
research is often an intensive study of a few individuals and 
rho is often used in such situations. Tabic 1 was constructed 
with these considerations in mind. Kendall (i, Table 16.a) has 
tabled the probability of obtaining the various values of 2d ! 
for several different values of N. The transformation of 2d ! 
and probabilities into rho and levels of confidence is obvious. 
Table x allows us to make (within rounding errors) an ‘exact 1 
test of the null hypothesis for samples of 4 to 8 cases. 

Table 1 may be used in the following manner. Assume a 
sample of N 7 and p ** .57. We may then reject the null 
hypothesis at the ao per cent level of confidence. 

In both Figure I and Table 1 we are testing hypotheses 
concerning the absolute value of rho. 
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SUNDAY, MARCH 26 
EXECUTIVE COUNCIL MEETING, ACPA 
MONDAY, MARCH 27 

GENERAL SESSION 

Presiding._ .... Hilda Threlkeld 

Dean of Women, University of Louisville 
Symposium; “Counseling Problems and Techniques: Develop¬ 
ments for the Future in the Light of an Evaluation of the 
Present.” 

"Developments in Counseling by Faculty Advisers” 

Carroll Miller, Assistant Dean of College of Liberal 
Arts, Howard University 
"Developments in Residence Hall Counseling” 

Merle M. Ohlsen, Associate Professor of Education, 
Washington State College 
“Developments in Counseling Bureaus and Clinics” 

Royal B. Embree, Assistant Director, Counseling Bureau, 
University of Texas (Read by Gordon Anderson, Director 
of Counseling Bureau, University of Texas) 

LUNCHEON 

Presiding. . Mitchell Dreese 

Dean of the Summer Sessions and Piofessor of Educational 
Psychology, George Washington University 

“No Vain Imaginings”. Thelma Mills 

Director Student Affairs for Women, University of Mis¬ 
souri, and President ACPA 

FIRST BUSINESS MEETING 

Presiding ...Thelma Mills 

Director Student Affairs for Women, University of Mis¬ 
souri, and President ACPA 
Reports: 

Kate Mueller, Chairman Committee on Research 
Clifford Houston, Chairman Committee on Standards 
Georoe A, Pierson, Chairman Committee on Nominations 
Lyle W, Croft, Chairman Committee on Membership 

SECTIONAL MEETING 

Presiding. Jacob H. Cunninqham 

Dean of Students, Lynchburg College 
“The Role of the Church Related College in Higher Education.” 
Raymond F. McLain, President, Transylvania College, Lex¬ 
ington, Kentucky 


443 










2 , 


444 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

TUESDAY, MARCH a8 
"Council Day" 

WEDNESDAY, MARCH 99 

SECTION AI. MEETINGS: 

"Major Problems of Personnel Administration of Concern 
to All College Personnel Workers" 

j. Presiding . . . . - ....., Duoald Arbuckle 

Director Student Personnel, School of Education, Boston 
University 

Panel Discission (for those from large universities and colleges) 

Panel Members: 

Martin Snoke, Assistant to the Dean cf Students, Uni¬ 
versity of Minnesota 

John I,.’Bergstresser, Assistant Dean of Students, Uni¬ 
versity of Chicago 

Daniel D, Reger, Dean of Students, University of Denver 

Presiding . . Everett B. Sackett 

Dean of Student Administration, University of NewHamp- 
shifts 

Panel Discussion (for those from middle-sized colleges and 
universities) 

Panel Members: • 

Robert Kamm, Dean of Students, Drake University 
Nathan Kojin, Registrar, Washington University 
William C. Craig, Acting Dean of Students, Washington 
Scare College _ _ _ 

Presiding . .L. R. Palmerton 

Director Student Personnel, South Dakota School of Mines 
and Technology . 

Panel Discussion (for those from small liberal arts colleges, 
church-related colleges, and teachers colleges) 

Panel Members: T , , 

Lawrence Riggs, Dean of Students, DePauvv University 
Helen M. Voorhees, Director, Appointment Bureau, 
Mount Holyoke College 
Louise T. Paine, Dean, Elmira College 

SECOND BUSINESS MEETING 

Presiding ..* • Thelma Mills 

‘ Director Student Affairs for Women, University of Mis¬ 
souri, and President ACPA 

$aul McMinn, Chairman Committee on Publications 
Ralph Carli, Chairman Committee on International Kela- 

C. H. Revdisiu, Cliairman Committee on Proceedings 
GENERAL SESSION 

Presiding... L. Shepard 

Dean of Student Personnel, Stephens College 
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Main Speech: "Evaluation and Research in Group Dynamics” 
Kenneth F. Herrold, Assistant Professor of Education, 
Teachers College, Columbia University 
Two Illustrative Studies, reported by: 

Ira J Gordon, Kansas State College 
David S. Brody, Montana State College 

general session 

Presiding ... ..... .. . . .. .Paul C. Polmantier 

Director University Testing and Counseling Services, Uni- 
sity of Missouri 

Symposium: “Problems of Evaluation in Student Personnel 
Work” 

“How to Go About The Process of Evaluating Student 
Personnel Work” 

William M. Gilbert, Acting Director Student Counseling 
Bureau, University of Illinois 
“Majoi Limitations in Current Evaluation Studies” 

Ruth Strang, Professor of Education, Teachers College, 
Columbia University 
Two Illustrative Studies, Reported by: 

Robert B. Kamm, Dean of Students, Drake University 
Edgar Z. Friedenberg, Adviser, Univeisity of Chicago 

SOCIAL HOUR 

Hostess... Anna M. FIanson 

Director of Placement, Simmons College 

SECTIONAL MEETINGS- 

(These will be Discussion Groups—no planned speeches— 
attendance at eacli limited to the first 25 people to apply for 
special admission card at Information Desk. Prerequisite for 
obtaining card is willingness to talk on the topic listed.) 

1. Discussion Leader . . John Withal 

Assistant Professor, Department of Education, Brooklyn 
College 

Topic: “To What Extent Should the Use of Test Results Be 
Limited to Qualified Personnel?” 

a. Discussion Leader . . .. . Robert H. Shaffer 

Assistant Dean of Students, University of Indiana 
Topic; “How Can We as Student Personnel Workers Stimulate 
and Motivate the Student with Higher Ability?” 

3. Discussion Leader.M. Catherine Evans 

Assistant Director of Counseling, University of Indiana 
Topic: “Tlie Use of Sociometric Techniques in Residence Hall 
Work.” 

4. Discussion Leader .Nathan Kohn, Jr 

Registrar, University College, Washington University 
Topic: “Are Freshmen Orientation Courses Desirable?” 
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THURSDAY, MARCH 30 

SECTIONAL MELTINGS: 

t . Presiding . . C. W. McCracken 

Dean of Students, Muskingum College 
Symposium: "Student Activities in Relation to College Person¬ 
nel Work" 

"The Role of Student Government in the Student Personnel 
Program" 

Brother Louis, Dean, St. Mary’s College, Winona, Min¬ 
nesota 

"Student Personnel Work and the National Student As¬ 
sociation" 

Gordon Klopf, Chairman, National Advisory Council, 
N.S.A., University of Wisconsin 
"Contributions: of the Student Union to the Total Student 
Personnel Program" 

Donovan D. Lancaster, President, National Association 
College Unions, and Director, Moulton Union, Bowdoin 
College 

a. Presiding... Robert F. Moore 

Director, Personnel Office, Columbia University 
Panel Discussion: "Reciprocal Contributions of Student Person¬ 
nel and Industrial Personnel 11 
Panel Members: 

Donald S. Bridgman, Personnel Department, American 
Telephone & Telegraph Co. 

Forrest H. Kirkpatrick, Dean of Students, Bethany 
College 

Otis C. McCrkkry, Director of Training, Aluminum Com¬ 
pany of America 


SECTIONAL MEETINGS! 


I , Presiding .. .Walter F. Johnson 

Associate Professor, Institute of Counseling, Testing and 
Guidance, Michigan State College 
Symposium: "Selection and Training of College Personnel 
Workers" 


Speakers: 

"Problems and Trends in the Selection for Training of College 


Personnel Workers" _ . , 

Georoe A. Kelly, Director, Psychological Clinic, Ohio 
State University , . . 

"Major Issues and Trends in the Graduate Training or 
College Personnel Workers" 

Willard W. Blaesser and Clifford P. Froehlich, 
United States Office of Education 

a. Presiding .Donald J. Shank 

Vice President, Institute of International Education, New 
York 
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Symposium: “Broader Horizons in Personnel Work” 

Speakers: 

“The Employment Outlook for 1950 College Graduates” 
Ewan Clague, Commissioner Labor Statistics, United 
States Department of Labor 

“Aspects of Manpowei Mobilization of Significance to College 
Personnel Workers” 

James C. O’Brien, Associate Director Manpower, National 
Security Resources Board 
“Our Stake in the Occupied Countries” 

Harold E. Snyder, Director, Commission on Occupied 
Areas, American Council on Education 
“Plans for the New International Christian University in 
Japan” 

Maurice E Troyer, Vice President in Charge Curriculum 
and Instruction, Japan International Christian Univer¬ 
sity Foundation 



AMERICAN* COLLEGE PERSONNEL ASSOCIATION, OFFICERS ANn 
COMMITI'EES ND 

orrirai, 1949 jo 

President; Thelma Mills, Director, vStudent Affairs for Women 
University of Missouri 

Vice President: E. H. Hopkins, Vice President, State College of 
Washington 

Secretary: Robert H. Shaffer, Assistant Dean of Students, Indiana 

^University 

Treasurer: Marcia Edwards. Associate Dean, College of Education 
University of Minnesota ’ 

EXECUTIVE COUNCIL, 1949-jo 

Gordon V. Anderson, Director, Bureau of Testing and Counseling, 
University of Texas 

Willard W, Blaesser, Specialist for Student Personnel Programs, 
U, S. Office of Education 

Edward S, Bordin, Director, Bureau of Psychological Seivices, 
University of Michigan 

Daniel D, Eeder, Dean of Students, University of Denver 

Forrest H, Kirkpatrk'k, Dean of Students, Bethany College 

OFFICERS, 1950-51 

President: Thelma Mills, Director, Student Affairs for Women, 
University of Missouri 

Vice President: E. H. Hopkins, Vice President, State College of 
Washington 

Secretary: Robert H. Shaffer, Assistant Dean of Students, Indiana 
University 

Treasurer: Marcia Edwards, Associate Dean, College of Education, 
University of Minnesota 

EXECUTIVE COUNCIL, 1950-Ji 

Gordon V. Anderson, Director, Bureau of Testing and Counseling, 
University of Texas 

Lm W. Croft, Director of Student Personnel Services, University of 
Kentucky 

Clifford E, Erickson, Professor of Education, Michigan State 
College 

A, Blair Knapp, Vice President, Temple University^ 

Donald E, Super, Professor of Education, Teachers College, Co¬ 
lumbia University 

PROGRAM COMMITTEE, 1949^° 

Cornelia D. Williams, Chairman, Associate Professor and Counse¬ 
lor, General College, University of Minnesota 
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Norman Lange, Director of Student Peisonnel, University of Ver- 
mont 

Dugald S. Arbuckle, Director of Student Personnel, Boston Uni¬ 
versity 

John S Beard, 5835 Kimbaik, Chicago 37, Illinois 

Lucile B. Brown, Child Education Foundation, New York, N. Y. 

Ralph B, Bridgman, Piesident, Merrill Palmer School 

Henry J. Cunningham, Dean of Students, Lynchbuig College 

Janice A. Janes, Counselor in Occupational Guidance, Stephens 
College 

Victor B Johnson, Associate Dean of Men, Clark University 

Margaret Ruth Smith, Associate Admissions Officer, Wayne Uni¬ 
versity 

Thomas S. Richardson, Director of Student Personnel, Texas 
Christian University 

Albert S Thompson, Associate Professor of Education, Teachers 
Colege, Columbia University 

CONVENTION COMMITTEE CHAIRMEN, 1949-50 

James A McClintock, Director of Personnel, Brothers College, 
Drew University, Local Arrangements 

Willaim M. Wise, Dean of Student Personnel, University of Florida, 
Exhibits 

Helen M. Voorhees, Appointment Bureau, Mt, Holyoke College, 
Information 

Mary D Bigelow, Chairman of Advising, Stephens College, Meals 

Anna M. Hanson, Director of Placement, Simmons College, Hos¬ 
pitality 

Robert H. Shaffer, Assistant Dean of Students, Indiana University, 
Publicity 

John H. Corneiilsen, Jr,, Professor of Education, Department of 
Guidance and Personnel Administration, New York University, 
Meetings 

Clare I. Davis, Dean of Men, Southern Illinois University, Place¬ 
ment. 


ACPA COMMITTEE CHAIRMEN, 1949-50 

Kate Hevner Mueller, Indiana University, Research 
Clifford Houston, University of Colorado, Standards 
George A Pierson, University of Utah, Nominations 
Lyle W. Croft, University of Kentucky, Membership 
Paul McMinn, University of Oklahoma, Publications 
Ralph _A, Carli, Stevens Institute of Technology, Internationa] 
Relations 

C, H, Reudisili, University of Wisconsin, Proceedings 
Ralph Bridgman, Merrill Palmer School, Public Recognition 
Wray H, Congdon, Lehigh University, Local Arrangements 



EDITORS' FOREWORD 


The twenty-third annual meeting of the American College Person¬ 
nel Association was held at Atlantic City from March T) to 30, 1950 
in cooperation with the constituent members of the Council of 
Guidance and Personnel Associations. The convention program was 
organized to develop the theme, “The Personnel Profession: Achieve¬ 
ments and Objectives." Twenty papers were read, eight panel discus¬ 
sions were presented, and two business meetings were held by A CPA 
members during tile four-day period. Eighteen of these papers appear 
in this publication of the Proceedings. Two papers were not prepared 
for publication by their authors. 1 he panel discussions held during 
this convention were not recorded for these proceedings. 

On Tuesday, March 28, the members of ACPA participated in the 
program sponsored by the Council of Guidance and Personnel Asso¬ 
ciations. At the morning session President Howard R. Beattie made 
his Annual Report, after which Thelma Mills, ACPA President, and 
the members of her Committee to Consider Unification made an 


important proposal to reorganize CGPA into an International Person¬ 
nel and Guidance Association. At n:oo a.in. the convention was 
broken down into many small groups where the proposal was explained 
further and discussed freely. The convention then reconvened and 
accepted the recommendation of the Committee on Unification that 
the reorganization proposal be taken back to the members of the 
various Associations for their consideration during the coming year 
anti that final action be postponed until the 1951 convention. 

At the "Council Day" luncheon meeting, Mr. Laurence A. Appley, 
President of the American Management Association, discussed the 
subject, "Greater Utilization of the Educator’s Knowedge of Human 
Potential.’’ In the afternoon, Dr. John E. McGowan, Lecturer in 
Psychiatry at New York and Columbia Universities, addressed the 
convention on the topic, "Psychiatry for Counselors." Later Mr. 
William Line, Professor of Psychology at the University of Toronto, 
spoke on the subject, "The Scientific Status of Counseling.” The 
papers presented by Mr. Appley and Mr. Line will appear in the 
Journal of the National Association of Deans of Women. 

The American College Personnel Association members present at 
the convention were informed by the Membership Chairman, Mr. 
Lyle Croft, that our organization is now approaching a total mem¬ 
bership of one thousand college personnel workers. With this increase 
of almost three hundred associates during the past twelve months/' 
we are looking forward to another successful year and to the twenty* 
fourth annual meeting of the Association which will be held at Chicago 
March 16 to zp, 1951. 

Georoe A. Pierson 
University of Utah 


t) 


Catherine M. Northrop 
University of Denver 



developments in counseling by faculty 
ADVISERS 


(An Abstract) 

CARROLL L. MILLER 

Assistant Dean of the College of Liberal Arts, Howard University, 
Wasnington, D. C, 

Significant among the recent trends in higher education 
is a growing recognition of the obligation of the college or 
university to each student accepted for admission. One result 
of this development is an increased awareness of the need for 
“individualization” and the necessity for expanding the 
facilities for handling the entrant as a person. 

The organization of these services may vary from institution 
to institution, but the aim is basically the same; namely, to 
assist in the development of the potentialities of the individual 
within the framework of the philosophy of the school. For the 
realization of this aim, the college relies in part on its counseling 
and advisory facilities, which normally include the services of 
faculty advisers, residence hall counselors, and specialists in 
counseling and clinical techniques. 

The success of any program of counseling in college depends 
in a large measure upon the effectiveness of faculty advisory 1 
services, for the bulk of the counseling problems on a campus 
are those needing educational guidance, and the faculty 
adviser is frequently sought by the student when questions 
relating to academic matters arise. 

In order to determine the role played by faculty advisers 
in the student personnel programs of institutions of higher 
learning in the United States, a Questionnaire was sent to nj 
selected colleges and universities. Replies were received from 

1 The term, faculty adviser, is used here to refer to the general adviser rather than 
the major field adviser. It is felt that the results of an effective faculty advisory service 
dunng the freshman and sophomore years will decrease to a minimum the problems 
for major field adviser in subsequent years. 
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90 of these schools. 1 The instrument was devised specifically to 
discover (1) the methods used to select faculty advisers; (2) 
the services performed by faculty advisers; (3) the methods of 
orienting and training faculty advisers. 

Faculty advisers were available in 86 of the 90 schools 
reporting. These advisers were selected by the individuals or 
groups listed below: 


S/ttcliont made bj Number oj Schools 

Dean of the College.24 

Heads of Departments and Dean of College. 15 

Dean of College and Coordinator of Counseling. 8 

Heads of Departments. 6 

Coordinator of Counseling . 6 

Dean of Students.... . 4 

Dean of College and Faculty Committee.. ., , 3 

Heads of Departments and Coordinators of Counseling.... 3 

Board of Advisers.... 1 

Dean of Freshmen. 1 

Dean of Men. a 

Student Groups..., . . 2 

Chairman of General Education Program. 1 

Coordinator of Counseling, Dean of Men, and Dean of 

Women. ; ■ 1 

Dean of College, Dean of Men, Dean of Women, Registrar 

and Director of Guidance. 1 

Dean of Students and Heads of Departments. 1 

Dean of Students for College, Staff, Chairmen, Dean of 

College, Dean of Students for University. 1 

Dean of the University. 1 


Heads of Departments, Dean of the College, and Coordi¬ 
nator of Counseling. 1 

President of the College. 1 

President, Dean and Faculty Committee. 1 

Total. 86 


1 Of the t)0 institutions from which Questionnaires were received 76 were coeduca¬ 
tional: 79 were members of the American Association of Universities; approximately 
half had faculty members belonainK to ACPA. These schools were distributed as 
follows: New England States 9, Middle Atlantic States 19, Central States 40, Southern 
States 16, Western States 6. 
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In selecting advisers four characteristics were taken into 
account and wereTeported as - foi lows; 

Genuine interest in and understanding of students—men- 

tknved,-7-3-feirn:e9. 

Willingness to take time to advise students without addi¬ 
tional compensation—mentioned-14 -thrrres-. 

Knowledge of course requirements, curricula, and regulations— 

meiTtioHed--8~»mes. 

Interest in total educational program—merttitmed-4—times. 

The s ervices per formed.,b-Y- facultY-advisers in-- 84 ~s&hools 
ranged all of the way from assisting students in selecting 
courses to helping students gain insight into their personal 
problems. The activities reported in which advisers engaged 
are listed below: 


Activities Number Reporting 

Assistance in the selection of courses. 86 

Assistance in long range academic planning for a career. . 83 

Explanation of academic regulations.77 

Referrals to other agencies. 70 

Follow-up of academic progress through periodic reviews 

of records. 65 

Exploration of personal problems.51 

Assistance in securing aids to academic adjustment.33 

Entertainment (social) of advisees. 1 

Rating of each advisee on citizenship . 1 

Assistance in personalizing freshman week. 1 

Some form of in-service training for faculty advisers was 
provided in 5§-ef the institutions reporting/*Periodic meetings 
in which common problems were discussed was the most 
frequent in-service training methqd.^ Other techniques used 
were workshops, case conferences, organized summer courses, 
and faculty adviser’s handbooks. 

y )In the majority of colleges and universities (75) no reductions 
in teaching load were made to compensate for the time spent as 
advisers. ^Additional compensation was provided faculty 
advisers by eight institutions '/•’one institution freed advisers 
from committee work; -ftftd^another institution provided 
additional compensation and reduced the teaching load. 
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A few colleges and universities have made definite efforts to 
improve their faculty advisory services. Among these are 
Stephens College, where an elaborate adviser’s training program 
is in effect; Ohio State, where advisory services have been 
centralized; Colgate, where graduate students are used to 
supplement the services of faculty members; and San Francisco 
State, where an instructor-advisory plan is now in operation, 

While there is greater concern for the welfare of the individual 
' college student today than was true a generation ago, indiffer¬ 
ence still characterizes the efforts of many faculty advisers. 
Among the problems yet to be solved are the following: How 
can faculty advisers be used most effectively? That is, how 
can their services be made a part of the student personnel 
program of the institution? What are the personal character¬ 
istics of an effective adviser? How important is training in 
developing an effective faculty adviser? To what extent should 
faculty advisers attempt to counsel students regarding their 
various adjustment problems? And, finally, what consideration 
can and should be made to compensate faculty advisers for 
their additional responsibilities? 

t 



developments in residence hall counseling 

MERLE M. OHLSEN 

Associate Professor of Education, Washington State College, Pullman, Washington 

Have you been following the professional literature which 
has been written on the topic of residence-hall counseling? 
If you have followed it carefully over the last twenty years, 
you have found that it has not consumed much of your time 
It is true that writers in the field of student personnel work do 
mention the topic occasionally. They usually agree that the 
residence-hall program has an important place in the student 
personnel program. 

In preparing this paper it occurred to me that there is one 
general objective of dormitory counseling. It is to help the 
student to better understand himself and his relations with 
people through his day-to-day contacts with interesting and 
friendly individuals who can work and plan with him. The 
purpose of this paper is to consider some of the issues involved 
in achieving this broad objective. Specifically, the following 
issues will be considered: 

1. How are present dormitory counseling services affected 
by the historical developments in student housing? 

2. How does the dormitory staff fit into the general frame¬ 
work of counseling services? 

3 What are some of the services which the dormitory 
counselors can provide? 

Let us consider these issues in the order in which they were 
stated. Stewart 1 reported that the problem of student housing 
dates back to the very beginning of the great European Uni¬ 
versities. This fact in and of itself is not so important, but her 
account of the gradual shifts in the student’s role in house 
government does have a direct bearing upon student-staff 
relationships. She traces the change as follows: "... in the 

1 Helen Q. Stewart. Some Social Aspect: of Residence Halls/or College Women. 
New York: Professional and Technical Press, 194a, p, £. 

ASS 
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mental flowering and freeing of the Renaissance, residence 
halls were largely student governed; and that little by little 
as learning formalized, authority for the conduction of the 
life in them was removed from student hands until it rested 
completely with the college authorities.” 8 If we can accept the 
statements of philosophy in present-day dormitory staff 
manuals as an indication of change in practice, it would appear 
that we are now moving in the direction of more democratic 
student-staff planning within dormitories. 

In any case, we cannot treat the development of the phil¬ 
osophy of student personnel work as it pertains to residence-hall 
groups as if it were independent of the rest of the student 
personnel program. Relative to the beginning of student 
personnel work in this country Cowley 5 said that the first 
college dean seems to have given most of his attention to 
disciplinary problems. Now if we recall that the early dean 
often lived in a dormitory as proctor, we see even more clearly 
why there was a staff-dominated relationship. It is probable 
that the pattern which was set in these early programs may 
still plague our dormitory counseling programs today. 

It is not likely that the dormitory counselor does his best 
work if he still holds the“papaor mamma knows best” attitude. 
We need professional leaders who can work with the students 
in helping them make plans rather than leaders who devise 
the plans and attempt to sell them to the students’ elected 
leaders. 


The Problem of Dormitory Staff 

What has just been said brings to the fore the second issue— 
that of dormitory staff. It is a problem to find staff members 
who have the training and the personal security which allows 
them to work with the students democratically. Orme 4 said that 
being a good disciplinarian and "nice woman who loves young 
people” are no longer adequate qualifications for dormitory 
heads. Whereas they may have been adequate qualifications 


* /iiW,.Stewart, p. 93. , . „ , 

*W, H. Cowley, "Some History and a Venture in Prophecy." Trends tn Student 

Personnel Work, E, G. Williamson (Ed.). Minneapolis: Minnesota Press, 1949. 

* Rhode Orme, Counseling in Residence Hulls, A Report of “Type C Project Doctor 
of Education Degree, Teachers College, Columbia University, 1948. 
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for the housemother’s position, they are not adequate quali¬ 
fications for the position of head counselor. Merely liking 
young people and being kind to them does not qualify the 
dormitory counselor to provide the kind of services which we 
shall consider here. 

It is true that some of the colleges and universities have met 
this problem. However, we still do have many housemothers 
as head counselors. Some schools place teaching faculty in the 
dormitories as head counselors; others use a combination of 
teaching staff and undergraduate assistants. Still others staff 
the living units with graduate counselors. A few employ 
well-trained, full-time head counselors. The full-time teaching 
staff member probably is too busy to give the job the time it 
really takes. Moreover, the job usually demands his attention 
at the time of day when he prefers to be doing something else. 

For the schools which do have the doctoral program, the 
mature doctoral candidate in student personnel, who has 
had personnel experience, appears to be the most promising 
candidate for the head counselor position. First, he has special 
training. Second, he needs the experience and he is motivated 
to do a good job. Third, he will be on the job at least three 
years. His services can be supplemented with upper-class 
undergraduate students. The young graduate student who is 
working on a half-time assistantship rounds out the staff 
nicely. I shall not treat either the problem of the number of 
staff members needed in a dormitory or the exact qualifications 
each should have. However, I shall define a given dormitory 
situation, describe the staff, and treat the problem of services 
in relation to these factors. 

A Specific Dormitory Situation 

Now let us think about the specific situation. I shall assume 
that the hall houses one hundred students. It has a full-time 
Head Counselor. To assist him in the more specialized services 
he has a Counseling Assistant who works half time in the 
residence-hall program and does half-time graduate work. 
There are also five carefully chosen undergraduate assistants 
who serve without pay. They act as liaison workers between 
the students and the paid staff. The Head Counselor has 
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overall responsibility for the dormitory. Naturally, it is up to 
him to develop a training program for his own staff, Not 
only has he the responsibility for training the six staff members 
described above, but he is also responsible for helping each 
of the dormitory officers to define and learn how to carry out 
the duties of his office. 

It is obvious that we should know more about the given 
situation before we attempt to plan a counseling program for 
it. Wc should have more information about the students who 
live there, the kinds of people the individual members of the 
staff arc, and the arrangement of the dormitory itself. But 
to know that these elements are important suffices for our 
purposes here. 

In passing, something should be said about behavior prob¬ 
lems. I believe that we should help students to take responsi¬ 
bility for their own actions. If a student is accused of breaking 
the social code for the house, his case should come to the 
attention of the Head Counselor. He, in turn, would help 
the house officers to collect the facts about the alleged violation. 
On the basis of the facts, the House Council would make a 
decision on the case. Should they decide that the case is some¬ 
thing which is too difficult for them to handle, the student 
would be referred to the student-faculty discipline committee, 
In any case, the house officers should keep detailed notes on 
the case and the disposal made of it. 

Working Relationships 

Since we are thinking about the staff, probably I should 
comment on student-counselor relationships. Even in the 
residence halls in which students really have had a chance to 
experience democratic planning, the feeling between the 
Counselors and the students is different from that in the 
Counseling Center. The dormitory staff member and the 
student are personal friends. The dormitory is the home- 
away*from-home. Hence, we have more of a friend-to-friend 
counseling relationship rather than a clinical relationship. 
Here the friendly staff member tries to help the individual 
students solve problems either individually or in groups. The 
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staff member not only tries to help individuals, but he tries 
to set up situations in which students can help each other. 

The whole problem of student-counselor relationship also 
identifies the need for more reflection on the issue of the 
student’s role in the house government. It is my own conviction 
that democratic planning not only helps to create a better 
within-house feeling, but it also stimulates greater personal 
development of the students. If we mean to work with students 
democratically we must trust them and their judgments. 
We must be willing to take chances and even to allow them to 
make mistakes. They must feel that they can settle issues 
through democratic processes and even go ahead to try a 
project which the Head Counselor has verbally opposed. 
This does not mean that the staff leader is not a participating 
member of the group. He is a member of the living group and 
as such he has the right to state his arguments in the case. 
The point is that the staff member should not insist on having 
his way. Granted, some may feel that too much has been made 
of this point, but failure to reach an understanding here often 
seriously affects other staff-student relationships. It is im¬ 
portant that there should be established a feeling of mutual 
trust—an atmosphere in which students and staff can work 
together democratically in creating and maintaining a living 
environment with greatest educational, social and cultural 
values. 


Questions of the 'Teaching Staff 

It is also important that the Dormitory Counselors learn to 
work with the teaching staff. Many questions have been raised 
by the teaching staff. Suppose we consider just three questions 
which I heard a staff raise recently: 

i. Just what is it that Dormitory Counselors do? 

a. Would we be able to notice any difference in our students 
if these services were discontinued? 

3. Is this the best and most economical way of providing 
these services for students? 

We will just have to admit that we do not have the answers 
to the last two questions now. That means we had better get 
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busy and evaluate our program. We may need these facts all 
too soon. The rest of this paper will be devoted to the first 
question. Just what is it that Dormitory Counselors do? 

Duties of the Undergraduate Assistant l , ■ £■ M 

The undergraduate assistant acts as a liaison between the 
students and the staff. He makes his contribution by providing 
the following services: 

i. By helping students to become acquainted in the house— 
both with the students and the staff. 

а. By becoming well acquainted with every student in his 
section—knowing their special interests, abilities, and problems. 

3. By referring students for help. 

4. By knowing the student resources in the house for special 
tutorial help. 

5. By distributing information which helps all the students 
keep well informed on both house and college-wide activities 
and regulations. 

б. By helping to promote good house government. 

7. By helping to create and maintain a friendly atmosphere. 
Obviously, this undergraduate assistant would soon lose his 
opportunity for real leadership in the dormitory if he ever 
became an inspector for an autocratic Head Counselor. 

8. By recognizing morale pioblems early—since he works 
with a smaller group of students in the dormitory, he is able to 
help the head counselor understand sources of difficulty. 

'The Dormitory Cotmselor's Services 

We have noted some of the things which the undergraduate 
assistant does. Now what is it that the Head Couns elor and the 
Counseling Assistants do to help students? 

I. The Dormitory Counselor should make himself available 
to students when they need to talk to a friend about personal 
problems. Those of us who have worked in dormitory programs 
know that the Head Counselor and Graduate Counseling 
Assistant can expect to be visited any time of the day or night. 
The student who knocks at the door during the night probably 
is too troubled to either sleep or study. He may need no more 



RESIDENCE HALL COUNSELING 


461 


than personal attention at the time when things have gone 
badly. He probably feels the need to talk to a mature friend. 
However, the trained Dormitory Counselor realizes that the 
student may need therapy which goes beyond the scope of his 
job and his competencies. 

1. Students want the Dormitory Counselors to help them 
with their activities. The Dormitory Counselor should do more 
than merely help students with the activities they now have. 
He should try to discover the students’ interests, then organize 
small groups to meet individual needs. Some of these small 
“cell” groups give a student a chance to achieve a measure of 
security which in turn helps him find and become affiliated 
with campus-wide activities. 

3. Social programs also provide the staff with another 
chance to help individuals in groups. Such activities as dinners, 
teas and coffee hours, dances, lectures, musicales, and discus¬ 
sions, all are a part of social education. These experiences 
can help the student to learn to live in a group and to appreciate 
some of the cultural values which a college education should 
provide. On the other hand, it is possible for the staff to 
promote a social program which the students neither want nor 
appreciate. Under these conditions little learning takes place. 

4. Inasmuch as the dormitory staff member does have a 
chance to see a student living in a variety of situations, he can 
provide facts about the student which helps others who also 
work with the same student. Dormitory staff members often 
pick up information about the student's family, his personal 
problems, health, study skills, special learning problems, and 
study conditions within the house. Some of these facts which 
the Dormitory Counselor discovers also help such special 
college committees as the ones on scholarship and discipline. 

5. Dormitory workers can become acquainted with the 
students who need special help. It is important that the 
Dormitory Counselor not only recognize these students who 
need special help but that he also knows the referral agencies 
and techniques of referral. All of us would certainly agree 
that the Dormitory Counselor must be thoroughly acquainted 
with each agency and its service before he can make an intel¬ 
ligent referral. The referral agency’s staff also has a responsi- 
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bility for working cooperatively with the Dormitory Counselors 
Status difference between these two levels of counseling often 
complicates this task. Since Dormitory Counselors are involved 
by the mere fact that they live with the student, they must be 
kept informed about the student’s progress and the part they 
can play in helping him to insure that both of these Counselors 
are giving the student integrated help. 

6 . If the dormitory is to become the student’s home-away- 
from-homc, then the staff must help to orient him to college, 
Ideally, the orientation to college should be started in high 
school. There should also be a college-wide program for the 
orientation of the new student. Nevertheless, the dormitory 
orientation program can be a vital factor in the student’s 
adjustment to college and life away from home. The dormitory 
staff should help the new student become acquainted with 
other students and the college program. 

7. The exit interview is another natural service of the 
Dormitory Counselor. Inasmuch as students do frequently drop 
out of school before they adjust to college work this is certainly 
a needed service in the dormitory. We should accept the 
student’s decision to drop school and to allow him the freedom 
he needs to talk out his decision. The very fact that we accept 
his decision to drop out and try to help him plan for the future 
often causes him to change his plans and stay in school. This is 
particularly true when his long-term plans do involve college 
training. 

8. Another problem of adjustment to college is the one of 
quality of scholarship. Actually, there may be as many as four 
elements in this problem for students: (1) developing good 
study conditions in the dormitory, (a) helping students to 
budget their time efficiently, (3) giving assistance in developing 
good study habits and study methods, and (4) improving 
reading skills. Of these four elements the dormitory staff can 
often help with the first three,but they will usually refer the 
students to the reading clinic for the fourth service. 

9. And, finally, there is one other large area of service in 
which Dormitory Counselors may give help—in educational- 
vocational planning. It is true that the teaching faculty should 
do the academic counseling, and that careful vocational 



residence hall counseling 463 

appraisal should be made with the help of a Clinical Counselor. 
Even so, the students do talk to the Dormitory Counselors 
about individual courses and fields of study. Hence, the 
dormitory staff member should have vocational information 
available to him. He also needs special job information on the 
fields of study available to students at the college. On the other 
hand, the dormitory staff should also refer the student to the 
college’s vocational information library. He certainly will 
want to refer some of the students to a more specialized 
counseling service for testing and counseling. 

Then, there are certain counseling services which Dormitory 
Counselors can provide. Obviously, not every dormitory staff 
will be able to provide help in all of these nine areas. The 
services the staff in a particular residence hall provides must 
be determined by the quality of the staff and the services 
provided by the other student personnel agencies. And since 
the residence hall program is just one part of the whole college 
program, I decided to conclude this paper with four questions 
for which answers are still needed: 

1. What are the in-service training needs of your Dormitory 
Counselors? 

1. Are we making use of the personnel techniques developed 
by other agencies and are we adapting these techniques for 
use in residence hall programs? 

3. Is this the best and most economical way of providing the 
counseling services defined in this paper? 

4. Would the teaching staff notice any change in the students 
if dormitory counseling services were discontinued? 
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ROYAL B. EMBREE 

Assistant Director, Counseling Bureau, University of Texas, (Paper read by Gordon 
Anderson, Director, Counseling Bureau, University of Texas) 

Introduction 

For many years it has seemed to the writer that the first 
need in any speech or article dealing with student personnel 
work is for a clarification and definition of the very title 
itself. This notion was reinforced by Dr. Cowley's justifiably 
choleric variation upon the semantic theme in the Minnesota 
publication 'Trends in Student Personnel Work . l Therefore, the 
beginning effort in this paper will be aimed at the provision 
of some basic premises for the consideration of “ Developments 
in Counseling Bureaus and Clinics." 

One of the most striking and productive phases of the 
personnel-guidance mental hygiene movement during the past 
two decades has been the establishment of a large number of 
comprehensive agencies, often on college and university 
campuses, which were designed to provide professional as¬ 
sistance to people through the channels of self-appraisal 
and counseling. These organizations, whether they arose 
under the sponsorship of the community or of an educational 
institution, have made a tremendous contribution to the 
meeting of individual developmental needs, not only through 
their direct service to people, but also through their emphasis 
upon professional training of staff members, scientific meth¬ 
odology and fundamental research. An effort will be made 
in the following section to trace the origin and growth of 
centralized psychological agencies in colleges and universities. 
The important points to consider now are the facts that 
(i) these agencies developed with a wide variety of titles, 

1 Cowley, W, H. "Jnbberwocky Versus Maturity.” Trends in Student Personnel 
Work, Minneapolis: University of Minnesota Press, 1949. Pages 342-349- 
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and (2) these agencies developed in direct response to the 
evident needs of their potential clientele and not as planned 
aspects of total institutional student personnel programs. 

This paper will be confined to a consideration of counseling 
agencies which have been developed by colleges and universi¬ 
ties. There has been little agreement with respect to the 
names given to these organizations. An opportunity to study 
this matter of nomenclature was provided by the excellent 
directory of counseling agencies recently released by the 
Ethical Practices Committee of the National Vocational 
Guidance Association. 4 Fifty agencies sponsored by institutions 
of higher learning were included in the Directory . Of this 
number, twenty-three, or nearly half, used the term center 
ill their listed titles. The next most popular designation was 
service , used by six institutions. Four listed agencies had titles 
which included the word bureau. In three cases, no title was 
stated. Other descriptive titles and their incidence were 
as follows: department—, 3, clinic—1, office— 2, division—1, 
unit —2, laboratory —2, and institute —1. 

The counseling agencies listed in the Directory included 
many, but by no means all, of the more active and better-known 
organizations in the colleges and universities of this country. 
It is clear that the terms used in the title of this paper are 
among the less popular ones and that preference is tending 
overwhelmingly toward the use of center in the description of 
these counseling agencies. It seems reasonable to predict that 
this preference will continue, since center has been very widely 
used in describing facilities for the counseling of veterans 
which are rapidly being converted into general college 
counseling organizations. 

The Directory also provides some interesting information 
concerning the second major point made above. Only seven 
of the fifty listed agencies appear to restrict their clientele 
to the students of their parent institutions. (It is obvious that 
agencies which do so limit clientele would be less likely than 
others to list themselves in the Directory) Approximately 


•Ethical Practices Committee, National Vocational Guidance Association. i_ 95 ? 
Directory of Vocational Counuiittt Agencies. St. Louis, Missouri: Washington U 
versity, 1950. 98 p, 
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So per cent of listed centers are open to adolescents and adults 
outside the institution and about 20 per cent are open to 
outside clients of all ages and levels of schooling, The median 
listed fee for non-institutional cases falls between $20 and $25. 
Twenty-seven of the fifty agencies indicate that they counsel 
veterans under contract with the Veterans Administration. 
It is clear that the majority of these centers have been de¬ 
veloped to serve the needs of a clientele extending well beyond 
the limits of the institutions which sponsor them. This extension 
of facilities represents an important public service, but, by 
strict interpretation, it carries the counseling service beyond 
the logical limits of a student personnel agency. On the other 
hand, however, thirty of these listed centers provide free 
service to the students of their parent institutions, indicating 
that they have been developed, at least in part, to meet 
intramural student needs. 

This prevalent dualism in collegiate counseling centers 
raises an important point. Many of these organizations are 
actually student-personnel facilities to only a partial degree, 
and this is especially true of some of the most extensive bureaus, 
centers and services. Other functions such as clinical work 
with children, general adult counseling, industrial consultation, 
examining, test-scoring and educational research may well 
occupy the greater share of the agency’s time and personnel. 

It is proposed that the subject of this paper be reworded as 
The Central Counseling Facility for Students in Colleges and 
Universities, defined as follows: 

A central counseling facility is an integral part of a student 
personnel program which provides an opportunity for special¬ 
ized counseling, by professional workers with access to the various 
technical devices which are being developed in the field of 
counseling. 

Such a facility may be part of a very extensive bureau or 
psychological service center. It may as well be the counseling 
office of a small liberal-arts or junior college, manned by a 
single professionally trained clinical counselor. Actually, there 
may be several central counseling facilities inside the same 
university, each representing a nuclear development within 
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some subdivision of the total institutional structure. The 
size of these centers, and the variety of services provided, will 
cover a broad range and will be conditioned by the character¬ 
istics of the institutions which develop them. The crucial 
point of the concept offered here is that the central counseling 
facility would be recognized by definition as an integral part 
of the total student personnel program of the institution, 
and would be considered separately from the other worthy 
functions often allocated to agencies which render psychological 
services in colleges and universities. It would seem probable 
that such a line of thought should tend to eradicate the rather 
insular characteristics of many counseling centers, thereby 
improving their integration with other aspects of the institu¬ 
tion’s total program of services to students. 

fhe Origin ami Growth of Central Counseling Facilities in 
Colleges and Universities 

Counseling centers in colleges and universities have tended 
to develop around the interests and stimulations of certain 
individuals and, in most cases, have been organized well in 
advance of the growth of generalized student personnel pro¬ 
grams in their parent institutions. The result has been a 
widespread effort to meet individual needs, institutional and 
otherwise, by providing the best possible services in the 
areas of self-appraisal through measurement and/or the 
counseling of individual clients. 

Perhaps the most satisfactory framework for considering 
the development of these counseling centers has been provided 
by E. G. Williamson in the first chapter of his book, Counseling 
/IdolescentsJ He proposes that the two great emphases upon 
counseling to date have been (1) counseling as a vocational 
guidance and (a) counseling as psychotherapy. A tracing 
back of the factors involved in the development of central 
counseling facilities in institutions of higher learning will show 
that they have tapped these two principal sources. 

The emphasis upon vocational guidance was apparent 
in the organizations developed in communities and schoo 

" • wlili’amson, E, G. Coumeiins Adoluunn . New York: McGraw-Hill Book Com¬ 
pany, 1950. S48 p, 
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systems to meet needs in this area. The early period of organi¬ 
zation has been effectively described by Reed 4 Probably 
the earliest establishment was the Vocational Bureau of Boston 
in 1909 under the direct influence of Frank Parsons. This 
type of service was reproduced many times in other school 
systems, and, shortly after the close of World War I, there 
were numerous people in colleges and universities who wished 
to make this vocational-educational service available to 
students in general These people were usually employed 
by departments of psychology or educational psychology 
and thus it happened that the vocational and educational 
services in which they believed tended to develop within the 
confines of these departments. 

The emphasis upon personal problems and therapeutic 
counseling has also exerted a great influence upon the develop¬ 
ment of central counseling services in colleges and universities. 
Members of psychology departments, and especially clinical 
psychologists, were concerned at a very early date with the 
individual emotional and developmental problems of college 
students. Their efforts to meet needs in this area were crystal¬ 
lized under departmental sponsorship and often grew into 
independent central counseling facilities. In a few cases, 
leadership in personal counseling originated with and was sup¬ 
ported by the student health service of a college or university. 

A few specialized references may provide body and color 
to this discussion. In 1934, Williamson reported on the organi¬ 
zation of the University Testing Bureau at the University of 
Minnesota in 193a, 6 He described how the Bureau was de¬ 
veloped to meet the increasingly complicated needs of students 
and he outlined the philosophy and procedure of the service in 
clear detail. He reported that 1,932 cases had been handled 
during the period I932-1934, and that these individuals 
represented a reasonably random sample of the university 
population. This central counseling facility grew out of the 
interest and stimulation of Donald G. Paterson who brought 

‘Reed, Anna Y. Guidance and Personnel Services in Education. Ithaca, New York: 
Cornell University Press, 194.), 496 p. 

‘Williamson, £. G. "Biennial Report of the University Testing Bureau, 1912-193^.’! 
P. 343 “ 3 JU Report oj the President for the Biennium 1932-34. Minneapolis: University 
of Minnesota, 1933. 
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his war-sharpened breadth of thinking to the Minnesota 
campus. 1 

Another type of development is represented at Ohio State 
University. Stogdill, who was a member of the Psychology 
Department, has reported upon the treatment of cases which 
dated back into the igao's. 7 The writer can vouch personally 
for Dr. Stogdill’s excellent work and he recalls having sat in 
on a case of hypnotherapy handled by Dr. H. H. Goddard 
but he remembers that student services were far from integrated 
since he also worked during 1930 with Dr. Louella Cole in a 
program designed to assist students in the improvement of 
reading and study habits. The clinical approach to student 
problems at Ohio State moved into the era of Rogers and still 
exists as a distinctly parallel facility to the Occupational 
Opportunities Service which is more clearly orientated to 
educational-vocational problems. 

McKinney has described the foundation and development 
of the "College Adjustment Clinic 1 ’ at the University of 
Missouri." This agency was developed about 1938 as an 
outgrowth of the Student Health Service. It is understood 
that it exists at present in tandem with a central counseling 
facility of a clearly educational-vocational nature which was 
developed to meet the demands of veteran advisement. The 
same dichotomy of emotional and vocational-educational 
services may also be found at the Universities of Chicago and 
Oklahoma, and elsewhere in the country. 

A more comprehensive service is described by Bailey, 
Gilbert and Berg at the University of Illinois. 5 This central 
counseling facility was designed from the beginning to utilize 
the services of clinical counselors and also of trained faculty 
counselors who were detailed to educational-vocational work 
with students. 

4 Williamson, E, G. Trends In Student Personnel IPork. Minneapolis: University of 
Minnesota Press, 1919. 417 p. 

1 Stogdill, E. L. "A Survey of the Case Records of a Student Psychological Con¬ 
sultation Service Over a Ten-Year Period. 11 Psychological Exchange. Ill (19+3)>. 1 ac ?-*J 3 - 

* McKinney, Fred. "Four Years of n College Adjustment Clinic, I. Organization of 
Clinic and Problems of Counselees." Journal oj Consulting Psychology, IX U 945 )> 

•Bailey, H. W., Gilbert, William M. and Berg, Irwin A. "Counseling and the 
Use of Testa in the Student Personnel Bureau at tne University of Illinois. Educa¬ 
tional and Psvchquigioal M&a»uk*mbnt, VI ( 194 b), 37~6o. 
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In conclusion, it may be pointed out that the development 
of central counseling facilities in colleges and universities 
has resulted from an emphasis upon vocational guidance, upon 
personal counseling, or upon a combination of these two 
factors. The actual patterns of development in most cases 
have been highly individualistic—dependent upon the per¬ 
sonalities and viewpoints of the principal influencers of growth. 
The relative newness in the student personnel scene of college 
counseling services and their close identification with the 
persons who founded and developed them account for the 
wide variations which exist today in matters of philosophy, 
function and policy. 

Present Tirends in the Development of Central Counseling 
Facilities in Colleges and Universities 

The most striking trend in the development of counseling 
services in colleges and universities is the rapidity with which 
these agencies are being activated on the campuses of this 
country. It was mentioned above that the 1950 Directory of 
Vocational Counseling Agencies listed fifty counseling facilities 
sponsored by institutions of higher learning. In a few moments, 
the writer was able to think of twenty-five active college 
counseling services which he knows of personally and which 
were not included in the Directory. Surely, there are many 
more. If the broad definition given for a central counseling 
facility be accepted, one could add to the list a large number of 
strictly intramural but professionally manned offices in smaller 
colleges and universities. The development of these counseling 
centers and services represents one of the most active areas of 
student personnel work. 

Probably no one factor has contributed more to the expansion 
of college counseling services than the Veterans Administration 
College and University Guidance Program. The implications 
of this extensive subsidization of counseling facilities for 
veterans on college campuses were discussed by Dreese at the 
1949 meeting of the American College Personnel Association, 19 
He reports that there were 415 centers in cooperating institu- 

15 Dreese, Mitchell. “Present Policies and Future Plans of College Guidance Centers 
Operating under V. A, Contracts—A Survey of the American Council on Education." 
Educational and Psycholooical Measurement, Part II, IX (1949), 558-578. 
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tions at the peak of the program and that an estimated 1,000000 
cases had been counseled by March 1, 1949, He endeavored to 
find the attitude of college administrators toward these services 
and their plans for the centers following the termination of 
government contracts. There seems to be no doubt that the 
V. A. guidance program has been a tremendous stimulus to 
counseling. Furthermore, about four-fifths of 154 institutions 
intended to continue the centers as part of their college person¬ 
nel programs even though only half of these schools had 
maintained a central counseling facility prior to the estab¬ 
lishment of the V. A. service. 

Another very vital trend is the rapid professionalization 
of the staffs of counseling services in colleges and universities. 
The position of clinical counselor has been clearly defined and, 
in some institutions, is officially established in terms of training 
standards, personal qualifications and duties. Reference to the 
above-mentioned Directory indicates that very high standards 
are being maintained by colleges and universities in their 
selection of directors and professional personnel for central 
counseling services. Thirty of the fifty listed college centers 
arc led by persons with doctoral degrees and only one director 
was without some advanced degree. These directors included 
thirteen people with ABKPP diploma, twenty professional 
members of N. V. G. A. and some thirty-four Fellows or 
Associates of the American Psychological Association. The 
professional staffs of these centers included approximately 200 
counselors, forty clinical psychologists and 100 psychometrists. 
The Directory provided opportunity to indicate how many 
professional employees were certified (Professional member 
N. V, G. A., Associate or Fellow of Division 17, A. P. A., 
diploma of ABF.PP, State certification). Approximately 50 per 
cent of the counselors, 80 per cent of the psychologists and 
17 per cent of the psychometrists were designated as certified 
personnel. 

A very important development is represented by the growing 
use of central counseling facilities in the training of graduate 
students who plan to be clinical counselors. Carefully planned 
and supervised internship and practicum experience in coun¬ 
seling centers have become the crowning factors in counselor 
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training in a number of institutions. The central counseling 
facility, regardless of size, can also make a real contribution 
to on-the-job training of faculty counselors. In some situations, 
a planned system of rotating faculty members through tours of 
duty in the central service has vastly improved their training 
as semi-professional counselors. 

There is no need to elaborate upon the trend toward increas¬ 
ing emphasis upon student personnel research in college 
f counseling agencies. They are admirably situated and ex¬ 
cellently staffed for this purpose. Already, the college personnel 
movement owes a mighty debt to certain of the more estab¬ 
lished counseling services which have produced a large amount 
of highly significant research in connection with their studies 
of students and counseling techniques. There is an unlimited 
future for development in this area, but careful programming, 
and cooperative planning by counseling services must be 
achieved, if optimal results are to be obtained. 

The rapid expansion of special services in college counseling 
agencies is another characteristic of present development. 
Specialized counselors are being provided to assist students 
in such areas as reading, study, human relationships, prepara¬ 
tion for marriage and marital adjustment. This growing 
tendency toward specialization results in a sort of clinical 
approach in which several experts share in the analysis and 
counseling of the individual when this is demanded by the 
situation. The field of counseling has become so complex 
and its literature and techniques so extensive that a certain 
amount of specialization is necessary. However, caution 
should be exercised in this connection for overspecialization 
could dangerously threaten the close personal association so 
important to a satisfactory counseling relationship. 

A final tendency, and a very significant one, is the movement 
toward the improved integration of central counseling services 
with the other phases of the total student personnel program. 
There is much to be done in this area, especially in the case of 
more insular counseling agencies. The task is simpler with 
smaller, more flexible central counseling facilities. This matter 
should be carefully considered by the many institutions which 
are converting their counseling centers for veterans into 
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student agencies, since there will in such instances be no 
established interests and policies to obstruct progress toward 
the integration of all student personnel services. 

tfhe Functions of a Central Counseling Facility 

This paper will be concluded by the presentation of a 
schematic system for outlining the functions of central counsel¬ 
ing facilities in colleges and universities. 

The responsibilities of such a counseling service may be 
represented effectively by a pyramid with a tri-lateral base. 
This sort of diagramming appears reasonable, since the central 
counseling facility can assume a very vital and focal position 
in the total personnel program of a college or university. 
The significance of this position is enhanced by the growing 
tendency to consider counseling as a basic educational process, a 
viewpoint which has recently been strongly emphasized by 
Williamson. 11 

Three functions or services are suggested by the base of this 
pyramid. 'The first, and perhaps the most important, is Side A, 
which represents direct, personal assistance to students through 
the media of self appraisal and/or counseling. There need 
he little concern regarding this function, for efforts to meet 
individual needs have been a characteristic aspect of college 
counseling services since their origin. 

Side B of the base is the essential function of training. 
There are at least three principal areas of training to which 
the central counseling facility can and should contribute. 
One is the continuous responsibility for stimulating and 
up-grading the staff of the center itself through organized 
programs of on-the-job training. Another is the training of 
various counselors in the institution (usually faculty or resi¬ 
dential) who are contributing to the total job of individual 
work with students. The third is the task of providing an 
opportunity for internship, or practicum experience, for 
graduate students who are specializing in the field of counseling. 
This need will arise only in the larger colleges and universities, 
but when it is possible, the integration of counselor-training 

u Williamson, E. G. Counseling Adolescents, New York: McGraw-Hill Book Com- 
pwiy, 19 jo. 548 p. 



COUNSELING BUREAUS AND CLINICS 4.7 f 

and central counseling activity can make a real contribution 
to both training and counseling. 

Side C of the base is the function of planned assistance to the 
other agencies in the institution which are engaged in the 
general task of counseling students. This represents the most 
neglected side of the figure. Little progress can be made in this 
direction until the problems of general integration mentioned 
above have been worked out. However, it is obvious that the 
central counseling facility, through its access to personnel 
data and through the insights and experiences of its staff, 
can render invaluable assistance to other counselors in the 
institutional personnel program. 

It is proposed that the altitude function of this pyramid be 
considered as deliberately planned and programmed research. 
The scientific study of students, and of the efficacy of methods 
used to assist them, will give body or volume to the entire 
program suggested above. Research may be directed at any or 
all of the three basal functions outlined: service to students, 
training, or assistance to the general and non-professional 
staff of counselors. The absence of this lesearch emphasis 
reduces the central counseling facility to a plane surface, 
without body or volume. The applications of research can 
vastly enrich any of the approaches which are made to serving 
the three functions of the central counseling facility which 
have been outlined here. 

In conclusion, it may be stated that the maximal value of 
central counseling facilities can be attained from the filling out 
of the pyramid suggested in this paper. There is nothing about 
the representation which needs to be conditioned by the size 
or number of employees of central facilities. The small central 
facility, manned by one counselor, can fill out the pyramid as 
effectively as the great college counseling center. The important 
facts are that the central counseling facility should contribute 
to (1) service to students, (a) training, and (3) assistance to 
extra-center personnel who are counseling students, and that 
there should be a dominating scientific approach to all that is 
undertaken in these areas. 



Presidential Address 
NO VAIN IMAGININGS 

THELMA MILLS 

Director, Student Affairs for Women, University of Missouri 

There is a fable about an ancient King, who, troubled by 
the economic woes of his people, called upon the economists 
of his kingdom for advice. Confused by their conflicting 
theories and counsel, he commanded them to prepare a short 
and simple text on economics for him. After many months 
they brought him many volumes replete with charts and 
graphs. In fury, the King banished half of the economists and 
commanded the other half to produce a text which he could 
understand. One after another they made reports that went 
over his head, and one after another they went into exile. 
Finally, all but one economist was gone. In fear and trembling, 
this last economist appeared before the King. "Your Majesty," 
he quavered, "I have reduced this subject of economics to a 
single sentence. In nine words I will reveal to you all the 
wisdom to be distilled from all the economists who once 
practiced in your realm: "THERE IS NO SUCH THING AS 
A FREE LUNCH!" 1 

As I speak to you today I am much like the last economist 
because, by asking the guests at the head table to join us and 
j pay for their own luncheon, I have proved to them that ACPA 
economics is no less rigorous. In another way I resemble the 
last economist, because I have set for myself the task of 
presenting a composite picture of the aims and aspirations 
of the presidents of ACPA during the past two decades. From 
the study of these reports camemy title,"NoVain Imaginings, 1 
for I found that not only were they sound in their thinking, 
but, also, profound. They did not vainly hope for their plans 
to be made realities, as you, too, will see in the next minutes of 

* From an ax tide In Slttlwayt by William J. Grey, 
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presentation. So settle back and prepare to enjoy a family 
reunion where we again gather together after the wars (personal, 
professional and actual) to evaluate what we have been doing. 
A reunion always calls for introducing some of the older 
members to the newer arrivals in the family circle, as well as 
to the guests, and our ACPA reunion for the fourth time at 
"Our” Atlantic City palatial home is no exception to the rule. 
To introduce all of the new family members to the old would be 
impossible, for our family has grown from a recorded ninety in 
February, 1932, to 894, paid as of March, 1950. May I recognize 
the 16 who are still active members of the Association: 


Fredericka Belknap 
Don Bridgman 
A. J. Brumbaugh 
Frances Camp 
M. D. Helser 
J. A. Humphreys 
Esther Lloyd Jones 
Forrest Kirkpatrick 


James McClintock 
Harriet E. O’Shea 
Luther Purdom 
Plelen Voorhees 
Edith Weir 
Mary A. Wegner 
Lewis Williams 
Robert Woellner 


We came of age with our 21st annual meeting in 1948, and 
so the following year our president, C. Gilbert Wrenn, had us 
analyzing ourselves to see whether in our adult life we were 
socially effective personnel workers. In "The Fault, Dear 
Brutus,” he asked us to discuss with him the psychological 
problems and temptations of college personnel workers and to 
think of some of the possible solutions. Now I am sure that our 
sixteen long-term members must have met the first of his 
prerequisites to real maturity, "have fun from our associations 
with people,” or they would not be here today, nor members, 
continuously, of the Association. 

Now let us turn our attention to the Association, ACPA, 
and see how it has accomplished the hopes and aspirations 
of its twelve presidential leaders through the years. I should 
like for the record to mention them and their schools. 


1923-25 May L. Cheney University of California 
1925-27 Margaret Cameron University of Michigan 
1927-30 Francis F. Bradshaw University of North Carolina 
^o-dd J- E. Walters Purdue University 

I 933 “d 5 Karl Cowdery Stanford University 

1935—37 Esther Lloyd-Jones Teachers College, Columbia 
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* 9 . 17 ' 39 A. J. Brumbaugh University of Chicago 

1939-4* Helen Vouchees bit. Holyoke College 

I94I -44 K. G. Williamson University of Minnesota 

1944-47 Daniel IX Feder University of Illinois 

1947-49 C. Gilbert Wrenn University of Minnesota 

1949 -51 Thelma Mills University of Missouri 

Twelve presidents, and may I call your attention to the fact 
that five of them have been women, elected by popular vote, 
This delineation of presidents has been for the record so that 
the younger members of the Association may have a ready 
file of reference. 

May I review for you, from the reports, what these repre¬ 
sentatives of yours hoped for and did accomplish. In February, 
1913, a group of persons interested in placement met in Chicago 
and, as a result, organized in 1924 the National Association of 
Appointment Secretaries with 79 members, From the be¬ 
ginning it was recognized that placement was only one phase of 
personnel philosophy and practice. The "personnel idea" was 
spreading in colleges, and the Appointment Secretaries’ 
Organization seemed the logical one to help pioneer in a 
growing program. 'Finis, a committee was appointed in 1926 to 
work with other groups, including the National Vocational 
Guidance Association, the National Association of Deans of 
Women, the Department of Superintendents, the Personnel 
Research Federation, and the National Committee of Bureaus 
of Occupations, in the planning of joint meetings. A community 
of interests rather than any thought of merging into an over-all 
organization brought these early leaders together, 

In 1929, the name was changed from National Association oj 
Appointment Secretaries ta National Placement and Personnel 
Officers. In 1930, in Atlantic City, a new constitution was 
proposed and the following year in Detroit the name was 
changed to American College Personnel Association. The new 
constitution was adopted and sectional divisions were set up 
in Educational Counseling, General Placement, Personal 
Counseling, Records and Research, and Teacher Placement. 

The 1932 annual meeting was devoted to a "Study of 
Personnel Activities in Members of the Association.” Here I 
must interpolate that "institutions” were the first members, 
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hence a study of personnel activities in members was in no 
W ay a “Dies Committee” hunt nor an F.B.I. investigation. 
The declared purpose of the Association was to increase the 
number of departments of personnel in Colleges and Universities 
by offering free advisement with ACPA officers. Ninety-five 
Colleges and Universities were members of the Association and 
seventy-one of them returned the data which were used for 
tabulation. May I quote from the paragraph on trends: “of the 
15 college personnel departments expressing a trend regarding 
administration of work of the department, seven indicate greater 
centralization of personnel activities; 12 expressed a trend 
toward better guidance; 11 reported a trend toward more 
general employment work; 6 departments expressed a trend 
toward more and better teacher placement.” 

Three items were of particular interest in the history of the 
Association during 1932: there was affiliation with the National 
Association for the Advancement of Science; the Annual 
Report was published as a separate publication for the first 
time; and the Association appointed, at the request of the 
U. S, Civil Service Commission, a committee to make a study 
of opportunities for women in government, with Mrs. Chase 
Going Woodhouse as chairman of the committee 

By 1933, we had found that prosperity had permanently 
disappeared around the corner. Presiding at the tenth annual 
meeting, Jack Walters described the Minneapolis conference 
as one of quality rather than quantity, with comparatively 
few attending because of depression and reduced budgets for 
traveling expenses. It was a year devoted to the preparation 
of a clearer statement of personnel principles and functions; 
to the establishment of higher standards of professional work; 
and to the search for a practical method of judging the effective¬ 
ness of college personnel services. The trend toward effective 
coordination of associations and agencies interested in guidance 
and personnel continued. Under the inspiring leadership of 
Dr. Harry Kitson a Coordinating Committee met with Dr. 
Keppel of the Carnegie Foundation to seek for ways of unifying 
the ten Associations “through headquarters, cooperative 
planning of programs of research, yearly activities and 
conventions, and joint publications,” 
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It is interesting to note, that the need for a permanent 
secretary was discussed at the 1933 meeting. The present 
need is even more urgent. This is one of our vain imaginings 
because of budget. Until we raise our dues, or increase member¬ 
ship far beyond the now "nearly one thousand,” the acquiring 
of a permanent secretary will remain in the planning stage. 

In 1934 Karl Cowdery stated that the purpose of the year’s 
work was "to approve more cooperative action with guidance 
and personnel groups," The 84 individual members and 18 
institutional members voted approval of this purpose and 
planned to join the American Council of Guidance and Personnel 
Associations with the following member Associations, four of 
which still remain members: 

•American College Personnel Association 
Institute of Women’s Professional Relations 
•National Association of Deans of Women 
•National Vocational Guidance Association 
National Federation of Bureau of Occupations 
Personnel Research Federation 
Southern Women's Educational Alliance 
Teachers College Personnel Association_ 

American Association of Collegiate Registrars (Affiliated) 
•National Federation of Business and Professional Women’s 
Clubs (Affiliated) 

Research was the dominant theme of the 1934 conference. 
The papers were definitely slanted toward "the personnel 
point of view," and, more particularly, to "individualized 
problems of students." 

Dr. Grayson Kefauver, of Stanford University, keynoted 
the 1935 convention with his address on "Developments In 
Educational Institutions." The contrast between the mech¬ 
anistic and individualized philosophies of education was 
sharply drawn. He made it dear that personnel policies should 
be formulated in terms of the latter philosophy. 

As an Association, this was a year for action. Seven thousand 
names of college staff members throughout the country, who 
had responsibility for personnel functions, were contacted to 
further professional solidarity in the personnel field at the 


* Current members of the Council Mid Guidance Personnel Association. 
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college level; a formal offer was made of the services of the 
Association to the federal government “in making and executing 
plans for the services of youths between 18 and a6 years of age.” 

The theme of the following year (Problems of Personal 
Adjustments in Moral, Religious, and Social Relations) 
reflected the same point of view. J. Hellis Miller, then Associate 
Commissioner of Education in New York State, raised a 
fundamental question. “Is personnel work an adjunct to, or 
is it education itself?” The question was clearly answered. 
Personnel services are not superimposed upon the educational 
process; they are an integral part of it. 

Vice-President Hopkins gave us the follow-up twelve years 
later, in 1948, in his paper on “The Essentials of a Student 
Personnel Program.” The whole-hearted response to an 
individualized philosophy of education, accepting the theory 
first and then putting it into practice, means that it must be 
written into the educational philosophy of each institution 
and considered to be the means of education, not adjunct to 
education. 

As an Association, we voted to continue as a member of the 
ACGPA and request the Council to continue its three com¬ 
mittees, (Research, Publications, and Coordination). Dr. 
J. E. Walter proposed “an investigation into what personnel 
services are being rendered at present in different colleges and 
universities.” He urged that a committee of three be appointed 
to initiate research projects such as (1) the advisability of 
formulating a statement of types of preparation offered for the 
training of personnel workers, (a) the preparation needed for 
college personnel work. (Corrine LaBarre made such a report 
in Columbus, Ohio in 1947.) 

Another action worth commenting upon dealt directly with 
us—the placement needs of our own membership. It was 
proposed that the Chicago Collegiate Bureau be used as a 
clearing house for filling personnel positions and placing 
personnel workers. We are still working on such a program, 
as you will hear on Wednesday. 

The personal element enters into the next step in our history 
for it concerns my own first attendance at an annual meeting. 
New Orleans was the place, 1937 the year. Esther Lloyd-Jones 
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talked on “What is this thing called Personnel Work?" She 
defined the more immediate needs of the association as: 

1. a continued effort to clarify the nature and scope of our 
professional field, 

2 . fundamental modification of the constitution to conform 
to our changing conception of the nature of the personnel 
program in higher education, 

j. and continued, careful, patient but aggressive attempts to 
cooperate within the A.C.G.F.A. with the guidance and 
personnel groups in the Council, (Akron). 

This was also the year that W. H. Cowley gave us "A 
Preface to the Principles of Student Counseling,’’ stating three 
fundamental characteristics of counseling: 

counseling as the personalization of education, 

counseling as the integration of education, 

counseling as the coordination of student personnel services. 

} Ic defined counseling broadly, “seeing the student and working 
with him as a whole person.” No vain imaginings, for again 
wc read this as a follow-up on the thinking of ACPA members 
expressed two or three years earlier. 

Our 16th annual meeting was held in Cleveland and with 
Dr, A. J. Brumbaugh speaking on “Personnel Services in the 
1 ,ight of Current Trends in Higher Education.” After presenting 
to us the “unitary nature” of the early American college, with 
the basic curriculum, he showed that as time advanced the 
program of colleges became more diversified, both as to scope 
and content, and that “fan like, higher education extended 
wider and wider in more divergent directions.” By the 19th 
century we had denominational colleges, women’s colleges, land 
grant colleges and specialized types like art, business, and 
normal schools, an increase from the 10 unitary colleges before 
the Revolution to the 2000 institutions of higher education 
today. Now, after the first of the 20th century this elective 
system is indicted in many quarters on the ground that it 
has led to early specialization at the expense of a broad liberal 
education. The assumptions of that period point "to a unified 
and generalized educational experience in direct contrast 
to the specialization that has prevailed.” Just what shall e 
the nature of General Education is still a matter of opinion and 
experimentation. 
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A new movement in quite another direction, from that 
of the return to liberal arts, has developed. The leaders of this 
movement believe that the essential unity of general education 
is not achieved through curriculum, but through the educational 
experience of the individual. Thus, the interests and aptitudes 
of individual students must constitute the focus of this educa¬ 
tion in which they acquire a self-discipline, integrates learning 
with experience, functions creatively in the society in which 
he lives. These two trends both attempt to achieve an essential 
unity in college education, one by way of intellectual disciplines, 
the other by way of individualized educative experience. 

The personnel services provided in any college must be based 
upon the purposes of the college and the needs of the students. 
Some services will be the same in all institutions regardless of 
individual differences because some of the student problems and 
difficulties will be the same, such as: selection of a college, 
variations in student interest and abilities, choice of vocation, 
the social development of the students, the health of the 
student, financial aid. The effective functioning of the intellect 
depends upon many collateral factors, as well as the free and 
disciplined intellect. 

Functionally, as an Association, the year 1939 was memorable 
in our history. The CHARTER for the ACPA became a 
published reality. This charter was drawn by “the Commission 
on Reorganization of the ACPA" appointed in 1937 and 
composed of Esther Lloyd-Jones, Karl Onthank, and C. Gilbert 
Wrenn. Basic to the preparation of the charter was the view¬ 
point reflected in The Student Personnel Point of View, a 
brochure published by the American Council on Education. 

This was the year that a committee with Edith Weir as 
chairman was appointed to write the history of our Association. 
At the preceding convention which celebrated the 15th Anni¬ 
versary of ACPA it was found that only a fraction of the 
members knew the early thought and effort which brought 
about our organization. Thus, it seemed time to review our 
past and to secure, from the early members, the information 
which only they possessed. 

The history was to be a compact record covering the various 
periods of growth from problems of teacher placement to 
the broader personnel phase, and the effort to develop programs 

rnvpfltior vari^nc r\f r n rl r avnr wlf-Ti fK r rRCulfinfl* rVuincr^Q 
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in names. Mrs. Cheney's forty years of placement work had 
given her invaluable knowledge of early personnel development 
not to be obtained from any other source, and it was felt 
that this should be recorded while she was still alive, Mrs, 
Cheney was given a life membership with full privileges in the 
Association. 

The need for regional groupings was discussed for the first 
time. May I quote: "to organize regional meetings in many 
sections implies a great deal of preliminary missionary work 
on the part of the membership committee.'’ No action was 
taken. The Committee on Relations with Faculty Advisors 
reported that the Committee had not yet advanced to making 
any recommendations concerning an invitation to faculty 
advisors to become members. The recognition of needs at 
one convention and their implementation at another have 
characterized the pioneering of our organization from the 
beginning. An even more appropriate illustration is found in 
Daniel Fcder’s suggestion at the 1940 meeting that we foster 
the establishment of a journal to print research in educational 
personnel and other closely related fields. This did not become a 
reality until 1944. 

The St. Louis Convention was held under the leadership 
of Helen Voorhees. A membership of Z39 was recorded. At 
this meeting a panel prognosticated on "The Future 0/Student 
Personnel Work" with the major issues summarized under (1) 
the functional curriculum and student personnel work, (2) 
the teacher and personnel work, (3) and the need for a strong 
national organization. This seemed to be such a forward 
looking program that I wish to bring the summary, via the 1950 
proceedings (Vol. XVII, p. 19), to you in full, 


The panel discussants included: A. T. Brumbaugh, C. F. 
Malir.hsrg, H. W. Bailey, H. D. Bragdon, D. Stratton, and 
II. H M01rl.mil , , , . ... . 

The major points of issue are summarized under the following 
headings: The functional curriculum and student personnel 
work; The teacher and student personnel work; The need 
for a strong national organization. 

The Junctional curriculum and student personnel work; lhe 
present need for student personnel work in our colleges and 
universities arises largely from a curriculum centered in subject 
matter rather than in student needs. Furthermore, this cur- 



PRESIDENTIAL ADDRESS 485 

riculum is taught by instructors who are narrowly trained 
in subject matter areas. The functional curriculum, if and 
when we adopt it, will probably preclude the necessity for 
having personnel officers, at least of the same type as at 
present. This point of view is generally held by most peisonnel 
workers. However, curriculum changes never occur with light¬ 
ning-like rapidity. One discussant who had made a careful and 
exhaustive study of the history of higher education, maintained 
that the functional curriculum will not dominate higher educa¬ 
tion for several centuries. In the meantime, personnel workers 
have much to do. Others hold that a real possibility exists of a 
radical change in higher education. If institutions of higher 
learning do not change from their traditional ways, mounting 
economic and social pressures will force changes. 

The teachei and personnel work: Can teachers be trained in 
the personnel point of view so as to take over a large number 
of functions now administered by personnel officers? One point 
of view maintains that college teachers cannot and will not be 
trained in personnel methods and viewpoints because of the 
nature of graduate training and the traditions of research and 
scholarship. By and large, college faculties are recruited from 
the graduate schools. Graduate training is oriented toward 
research, not toward students or teaching. Furthermore, aca¬ 
demic rewards are not won by the Great Teacher, but by the 
Great Scholar. 

The opposite point of view, held by a large number of 
people, is that all teachers should be trained personnel workers. 
With such additional training, teachers would do a better job 
of teaching and students a better job of learning. While this is 
acceptable in regard to secondary teaching, the adherents of 
the first point of view hold no hope that this can be accom¬ 
plished at the college level. They maintain that student 
personnel specialists would still be necessary even with a 
functional curriculum and with the student point of view. 
They will grant that the faculty may play a role in the instruc¬ 
tional type of personnel work; e.g., remedial reading, how-to- 
study, etc. 

In a few places, faculty members and graduate students 
who expect to teach are taking courses in methods of teaching, 
personal counseling, etc. Summer workshops, such as are offered 
at several centers, are organized to give personnel and teaching 
experience to college teachers. These innovations are unique, 
however. 

The need for a strong national organization: One viewpoint 
maintains that college personnel work will always be a sideshow 
of education unless we have a strong national organization 
which unifies all the branches of student personnel work. The 
opposition states its point this way: Strong national organiza¬ 
tions have a point and push it. Student personnel work is not 
yet ready for such a vigorous program. We are still experi- 
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meriting, 3 -ec us not hamper experimentation by adopting 
dngm.itic attitudes. This is true not only of personnel work 
but of higher education as well, 

The Council of Guidance and Personnel Associations is one 
attempt at unifying the field and providing a strong national 
organization. Some hold that it is nut broad enough, that such 
personnel groups as the registrars associations, the health 
officers, the union managers, etc., should be represented 
Others state that all college personnel organizations should 
unite into one association, divorcing themselves from organiza¬ 
tions winch arc made up predominantly of secondary school 
people. 

li representatives of college personnel services want to form 
a unified organization, no blueprint is available. They will be 
obliged to work out their plans in > conference, in regional 
meetings, and in group discussions. First, however, they must 
study the problem in terms of the needs of personnel work.” 

It was at this conference that we also had the first emphasis 
on group dynamics presented. Dr. Ruth Strang reported upon 
Iter rescrch in the field of group work and techniques, 

Work with groups often constitutes a more successful way 
than counseling of attaining empathy with individuals, of 
encouraging them to express their emotional problems, of 
providing constructive outlets for their impulses and of reliev¬ 
ing their tensions and anxiety. . ., Economy is a factor in the 
development of group work. In counseling, needs of individuals 
for certain group activities are discovered. Group activities 
serve as avenues of adjustment, thus they have both diagnostic 
and therapeutic values. 

Atlantic City, February 18-24, 1941, the 18th Annual 
meeting! Membership, 256. The Association gave serious 
attention to new membership requirements, “professionally 
trained persons and other interested, experienced and compe¬ 
tent workers.’* The membership approved of “dignified, slow 
expansion and growth,” 

The highlight of this meeting was the presidential address, 
President Voorhees spoke to us on "The Responsibilities of the 
Heritage of Personnel Work,” 

We hear much these days, of the advances which have been 
made in personnel methods, but for the time being, I should 
like to look back to tlve past, to the beginning of personnel work. 
Our predecessors had a rich background in an allied field of 
education; an experience which had given them a firm con- 
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viction and belief in the eternal verities. And they had marked 
success in carrying out their educational aims and purposes. 
They transferred their sense of values from the pulpit to the 
field of education. They came to their task possessed of the 
wisdom which comes only from a wide knowledge of human 
nature and its frailties; teaching the virtues which are necessary 
for living and for satisfaction and achievement. 

The purpose of this presidential address was to present some 
aspects of our work which are sometimes forgotten, the spiritual 
values of our profession. 

Great characters, not just, great scholars, were produced. 
Men devoted to service, with initiative, self-reliance and 
democratic ideas. 

Have our methods tended to emphasize personality rather 
than the necessity inherent in each of us of becoming a person 
in one’s own right? 

I am eager that one of our openly avowed objectives shall 
be to give the young people in our care some philosophy of 
life which will make it possible for them to get their bearing, 
no matter what happens. 

She was successful both in her presentation and purpose. 

In February, 1942, E. G. Williamson faced an especially 
difficult period of administration. Despite the war, our Presi¬ 
dent led us to think of that future, which lies beyond the 
present, in personnel work. He showed us that anything which 
leads to more effective conservation or utilization of youth’s 
potentialities actually does contribute to society’s welfare, 
as well as to the winning of the war. Hence, his address on 
“The Future Develops Out of the Past” was a highlight. 

Whatever our professional and personal behavior is as 
personnel workers, one thing is quite clear. Unless we have 
had the benefit of professional training and experience which 
prove to be effective in our handling of post-war problems, 
then we may expect that society, including college students 
themselves, will push us aside and find other types of personnel 
workers or other types of educational workers to handle this 
type of social revolution. The pressure for a solution to these 
problems, the greater articulateness of students and parents 
and the competition for public favor and support from members 
of social and government agencies, will force college admin¬ 
istrators to deal effectively with this anticipated situation. If 
we cannot do the job, then others will be found to do it. 

I believe that we are adequately prepared for the task and 
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that pv will make an effective contribution to the conservation 
uf ti«t Ail iJivtlhm, realign and social and personal values. 

I Iwhcv; that our contribution will be such as will strengthen 
iv, »r place in higher education and will increasingly attract 
graduate students to secure the necessary professional 
and persona! training to make college personnel work a sig¬ 
nificant part of higher education, competing successfully with 
other Mtcial welfare professions for the best talents in each 
student generation. 

The numbers that have been entering our profession have 
borne out his faith in the personnel program. 

From 1943 to 1946 the only record of the Association is that 
of the minutes kept by the secretaries and the record of the 
New York meeting in 194,1 when only a limited group met 
under the leadership of CGPA. Our officers and members were 
aiding in the promulgation of war programs in every branch 
of the service; those members left on the campus were doubling 
as personnel officers and campus recreational leaders for the 
military services on our campus. A record of these activities 
would be almost a complete history of the war activities. At a 
meeting of the {executive Council held in Chicago in December, 
1945, a Personnel-o-Oram was born, with Fred McKinney 
named as the first editor. The Council attempted to keep the 
membership informed through the media of the Educational 
and Psychological Measurement, published by an ACPA 
member, G. Frederick Kuder. The following ACE brochures, 
for which ACPA members had been chiefly responsible, were 
purchased, and distributed to the membership: 

"Counseling and Postwar Educational Opportunities. 

"Student Personnel Work in the Postwar College.” 

Active participation in the work of CGPA was continued with 
special attention given to regional conferences. "Judicious 
publicity” was carried on by sending a letter to some nc» 
college presidents concerning the Association and enclosing a 
paper written by Dr. John Darley on "Counseling and 0 eges 

in Post-war Education." _ 

By our first postwar annual meeting, held in 1947 at 'f' 
lumbus, Ohio (moved from Chicago, by consent, when the 
Stevens Hotel would not promise to accommodate a our 
members without discrimination), our Annual Reports were 
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resumed and issued as a supplement to Educational and 
Psychological Measurement, During this meeting we were 
definitely interested in post-war personnel services. In his 
presidential address, “When Colleges Bulge,” Dr. Feder 
awakened us to the imperative need of making an immediate 
and reasonably adequate adjustment to things as they are and 
not as they were in the nostalgic good old days. 

The problems discussed were common to all of our campuses: 
(1) the changed nature of campus population, (a) the generally 
changing motivation and orientation of all college students, 
(3) the need for high caliber professional services in vocational, 
educational and personal counseling of all students, (4) special 
problems, caused by previous military treatment of situations 
similar to those in classrooms, (5) ways in which the integrated 
personnel service program may serve both faculty and student 
body in more effectively meeting student needs. New services 
were being offered to students in their quest for maturity. 

President Feder called our attention to the fact that the 
field of student personnel work has suffered from an ill of its 
own making, the tendency to divorce its findings and activities 
from those of the classroom. As a matter of routine, he insisted 
we must transmit to the instructional staff those findings 
regarding student reactions and needs which will assist the 
faculty in the infusion of the realities, meaning, and purpose of 
contemporary life in the classroom. 

In Chicago, in April, 1948, C. Gilbert Wrenn, our “Chief” 
spoke to us on the “Greatest Tragedy in College Personnel 
Work.” It is worth our while to review these tragedies briefly 
for they point the way, just as the meeting, in 1940, on “The 
Future of Personnel Practices,” gave impetus to developments 
of the early forties. Guidance, as a term, was buried, and 
personnel, with its appropriate adjective, was nurtured, so 
that we might speak in common terms with school and non¬ 
school agencies about our concepts, as written into the Charter 
of the Association. Counseling was relegated to its appropriate 
position as one of a number of personnel functions and not the 
entire personnel program. 

Outstanding among these developments are the increased 
participation and consequent demand for professionally 
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equipped personnel workers; increased facilities for personnel 
research; and not least important, an increased humility on the 
part of all of us. 

He was not less concerned to point out pressing problems 
needing to he resolved: 

1. The lark of commonly accepted standards of performance 
and professional preparation. 

2 . Students and faculty, who have the most to gain from 
student personnel work, have the least to say about its 
development and emphasis. 

,p Poor coordination ol a student personnel program is fre¬ 
quently the result of an incompletely formulated line and 
.'.tad - organization. 

4. A student personnel program on a campus tends to be 
isolated from four important influences in the life of the 
student, (a) home, (It) secondary schools, (c) college class¬ 
room, (d) spiritual resources of the campus. 

As a final imagining I wish you to think briefly with me about 
human relationships, a field in which some of us may be 
devoting more of our time than to the more technical areas. 
Certainly this is a field in which we can never become com¬ 
placent with our achievements. The nation’s colleges and 
universities, today, arc placing more emphasis on producing 
well-rounded citizens. How would you answer the provocative 
question raised by the Pennsylvania Association of Deans of 
Women, ‘‘Do you improve human relationships through your 
guidance services?" Each of us must answer the question I 

When enough people can answer the question in the affirma¬ 
tive we shall indeed have arrived at the place in the personnel 
profession where we do not have to rely on vain imaginings. 
One does not need to sentimentalize the point. There is in¬ 
creasing evidence of a deepening unity among individuals 
and groups devoting themselves to the improvement of human 
relationships through personnel services. 

On a tablet in front of the Old South Meeting House, in 
Boston, are words that describe our Revolutionary forefathers 
as "worthy to raise issues." They knew which things were 
important and which were unimportant; and a person must 
be mature to raise issues. Most of the small frictions in life, 
human misunderstandings, that destroy mutual confidence 
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come from raising issues that are not worth raising, and most 
of the social inertias and timidities that keep our world from 
moving toward its ideals express a reluctance to raise issues 
that should be raised. One of our great responsibilities is to 
bring more reasonableness into the human scene, to bring it 
, to ourselves and to others, To carry our share of this responsi¬ 
bility we need to see in whole, instead of in part, to be ready 
to act responsibly where responsibility is called for, to forget 
ego and to seek wise understanding of others. ''Where there 
is no vision the people perish," and our part is to seek to make 
mature individuals of ourselves and others, that we may bring 
about a community where human beings may realize their 
visions. On a recent drive in my state I passed through three 
neighboring communities, Vista, Fairplay and Humansville, 
which have given me a new philosophy for daily living, “Where 
the vista is right, there will be fairplay in humansville,” and 
you and I must make it come to pass. 



EVALUATION AND RESEARCH IN GROUP 
DYNAMICS 

KENNETH P. HERROLD 

Pnrfmm of fiduutm, Texchm College, Columbia University 

I’NDERSTANMNti and evaluation of group dynamics must 
be in terms of the nature of the times in which men live, 
Industrialization has led to the wholesale, and sometimes 
indiscriminate, application of the scientific method to the 
material universe. The phenomenal and, at times, ghastly 
social and technological changes of this century have led to 
collective hysteria in one form or another in all parts of the 
world. The future of social science and, more important, the 
fate of mankind depend upon whether or not the populations 
of the world can adjust their living to the atomic era and live 
and work together intimately and creatively. 

The search for a science of human relations is not new, 
Some have denied that it ever would be possible to apply 
scientific methodology to the processes of human relations 
and at the same time to preserve individual freedom and a 
democratic society. These are legitimate challenges. Others 
believe that we must develop new forms of social discipline 
for interpersonal relations. There has always been appropriate 
skepticism of such suggested social innovations and inventions, 
and there has always been some inappropriate skepticism 
concerning the contributions of social scientists like those 
engaged in group dynamics research. 

Group dynamics is a term usually associated with certain 
concepts and procedures of research and study identified 
with Kurt Lewin and the Research Center for Group Dynamics 
first established at the Massachusetts Institute of Technology 
and later moved to the University of Michigan. However, 
group dynamics has aroused the interest of other social scien¬ 
tists who have never worked intimately or directly with the 
Lewinian group. These social science explorers are contributing 
valuable knowledge to the understanding of how groups 
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behave. Lewin, Lippitt, White and their associates have 
represented a strong team. Their work, such as the Iowa 
studies of the “Social Climate of Groups,” 1 has stimulated 
thought and controversy which has forced many others to 
consider basic problems of group and intergroup behavior 
before they might otherwise have done so. It is necessary and 
appropriate to indicate that responsible research and training 
in group dynamics is now being carried on at Harvard, Min¬ 
nesota, London, California, New York University, Columbia, 
Northwestern, and many other institutions of advanced study. 
There is no question of the status of the staff of the Research 
Center at the University of Michigan. However, to associate 
the development of group dynamics as a respectable field 
exclusively with this Research Center is to limit the progress of 
knowledge and inquiry. 

The turbulence set up by the group dynamics enthusiasts 
has not been universally supported nor accepted. Dean Robert 
B. Browne recently said of group dynamics: 

We are told that we have here something new and basic. 

One would like to remain receptive to what is new and basic 
without prejudice. We are told that here is something scientific, 
from Bethel laboratories and the M.I.T. and Michigan Re¬ 
search Centers. We all have a great respect for scientific 
inquiry and a staunch faith in its usefulness. We are told that 
here is something awfully democratic, and that seems to be 
all to the good. Furthermore, we are assured it’s for leaders, 
which ought to guarantee crowded classrooms where leadership 
training is offered. But just what is this new, basic, scientific, 
democratic leadership training furor, and what is there about 
it that is as yet either new or scientific or democratic or dynamic 
or even useful? 2 

Other words of criticism and challenge have been leveled 
at the proponents of this approach to an understanding of 
group relations. The development of studies of group dynamics 
has been accompanied by considerable misunderstanding, 
misinformation, and erroneous interpretation. The term 
“group dynamics,” therefore, signifies great challenge and 

1 Lippi tt, R. and White, R. "The "Social Climate” of Children's Groups.” Child 
Behavior and Development (R, Barker, J. Kounin, and B, Wright, Ed.). New York: 
McGraw-Hill, 1943, 

! Browne, Dean Robert B. From an address delivered at the annual convention of 
the National University Extension Association held in Edgewater Park, Mississippi, 
May 4,1949. 
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hope to woie, .m l to others it represents but a transient 

pa rue cm 

It has been s.i:d that the social unit approach to under¬ 
standing *»f human behavior denies the uniqueness of the 
individual.*' Nerd it hr dogmatically an “either or” relationship? 
Can we ever bruin to nicer student personnel needs on a 
strictly individual Imms } 1 Those of us who are daily confronted 
with impregnable schedules <»i individual appointments know 
hmv difficult it is to achieve even a satisfactory quality in our 
counwlmq relationships. Professional competence in the use of 
groups and in the analysis ol group dynamics can be achieved 
without detracting from the “student personnel point of view" 
and without attenuating the warm and friendly relations with 
students. In fact, the relations of personnel administrators 
with students, especially students in groups, may become even 
more respectable tin* better we understand the behavior of 
people in groups. 

Misunderstanding of the objectives and procedures of group 
dyanmict is, in part, due to the rapid growth of the field and 
the customary lack of adequate communication which ac¬ 
companies social innovations. It is also due to the inability 
or the lack of opportunity adequately to define the nature of 
group dynamics. 'Phis is t egret table. The purpose of this brief 
paper is to attempt to present: (t) one definition of group 
dynamics research and application, (2) a citation of certain 
problems in its developing research and evaluation studies, 
and (j) a prediction of some of the possible applications of 
such training, research and evaluation in college personnel 
administration. 

It would seem wise first to understand what those interested 
and working in the area of group dynamics are trying to do 
before we examine their research and certainly before an 
attempt is made to evaluate their activities. 

With what is group dynamics concerned? 

First, group dynamics is concerned with an understanding 
of the group related factors, forces, and determinants which 

1 Snrgg, Donald smi Combs, Arthur W. Individual Behavior. New York: Harper 
Brothers, 1949. P, 183. 
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influence individual behavior in groups and the course of 
social change. Lewin described this goal in his statement of the 
"Frontiers in Group Dynamics ” 4 In general, a group is a 
social organism of describable structure and function. In most 
instances the members of a group maintain a face-to-face 
relationship. Group dynamics is also concerned with the 
achievement of an understanding of groups as groups and the 
fundamental laws which govern the behavior of groups. Ob¬ 
viously such studies must rely upon the theoretical and ap¬ 
plied experience of all the social sciences, but especially upon 
social psychology, individual clinical psychology and cul¬ 
tural anthropology. 

Second , group dynamics is concerned with improving the 
application of already established knowledge and skills of 
human behavior to the critical social problems of our times. 
This objective was made explicit in Lewin’s second statement 
on the “Frontiers of Group Dynamics .” 6 Impatience with the 
manner in which social action has developed led many to 
challenge applied social science in recent years. Sellitz and 
Cook expressed this concern in their inquiry “Can Research in 
Social Sciences Be Both Socially Useful and Scientifically 
Meaningful ?” 8 It has been stated that it requires fifty years 
for society to adopt a new idea or practice. Much of the knowl¬ 
edge of theory and practice now being utilized by those con¬ 
cerned with the study of group behavior was developed and 
established many years ago. Some of the psychologists and 
social scientists who are today most critical of the “group 
dynamics” movement assisted in the early discovery and 
definition of social phenomena which form the theoretical 
base upon which the group dynamics movement is established, 
and these critics continue to reiterate the same or similar 
concepts with respect to social action, with no awareness of 
their commonality with group dynamics. It is often quite a 
different matter to apply social science than it is to write 

|Lewin, K. "Frontiers in Group Dynnmics: Concept, Method and Reality in 
Social Science; Social Equilibria and Social Change.” Human Relations, I (1947), 5-41. 

VJit-' 1 1 1 , 947 )’ „ 

| Sellitz, Clair and Cook, Stuart, W., "Can Research in Social Science Be Both 
Socially Useful and Scientifically Meaningful?” American Sociological Review, XIII 
(1948), 4J4-4S9- 
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about it or to discover its laws in the protection of controlled 
laboratory conditions. In fact, it is always difficult to appl 
what ’ac believe. to reduce the lag between our knowledge of 
8 f vs.tl rwsemc and our application of that knowledge to problems 
of human relations and social advance. 

The imni aspiration of those concerned with group dynamics 
is to enlist other social scientists from a variety of disciplines 
in the further study of group development, interpersonal 
relations within groups, relations between groups, and the 
basic laws of human relations. This requires an unusual type 
of iritdSectu.il maturity and material; it requires a complete 
integration of the basic theory and practice of many disciplines 
of social science. 

The present departmentalization of subject matter and 
professional training has handicapped this type of integrated 
study and practice. Furthermore, the study of group dynamics 
must he experience-centered and many of our institutions 
arc not yet ready to utilize the community as a laboratory 
for advance .study and skill development. Time and space 
will not even permit their cooperation in integrated study 
within the institution. Communities uninitiated to such 


working relationships with university or academic men must 
also he cultivated to utilize such resources. One may speculate, 
however, whether the community is more ready to have the 
academic man step outside his ivory tower than is the academic 
man ready to leave the protection of his isolation. This is, of 
course, a criticism every generation makes of the scholar and 
the researcher. Students and professionals in advanced studies 
in many fields are demonstrating the importance of an under¬ 
standing of group dynamics and the application of group 
procedures on their work. A few examples may be cited to 
document this assertion. Dr. Max R. Goodson of the College of 


Education, Ohio State University, has described the implica¬ 
tions of social engineering in public school administratin'. 7 

Dr. Edward C, Tolman of the University of California, 
recipient of the 1949 Kurt Lewin Award, in his Memorial 


1 Goodson, Max R, "Social Engineering ia a School System," Progrtuhi Education. 
XXVI (1949), 197-001, 
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Lecture 8 added to the theoretical structure of the basic 
concepts of group dynamics in social learning. Dr. Tolman 
stressed the nature of the influence of drives, beliefs, 'goals, 
perceptual readiness and perceptual blindness. His concepts, 
methods, and findings will more adequately illuminate the 
complex nature of student mores. 

Dr. Harold Fields, of the Board of Education of New York, 
recently reported in an unpublished manuscript the develop¬ 
ment of a group-interview technique used in the selection of 
teachers. This technique utilizes several of the common group- 
dynamics evaluation procedures such as systematic observa¬ 
tions of candidate behavior in situational tests involving six 
candidates. This method demonstrates how individuals behave 
in situations of reality and is a more realistic evaluation than 
paper-and-pencil test material. A similar procedure is now 
being considered for the selection of candidates for admission 
to one of New York’s medical colleges. Numerous professional 
and lay organizations are also utilizing group processes in their 
administrative procedures and in program development. 

It is difficult to discuss research and evaluation in group 
dynamics when the field, as such, is not yet adequately defined, 
or acceptable to many in the professions related to social 
science. The basic philosophy, concepts, values, and skills 
which constitute the common core of group dynamics theory 
and practice will make a fundamental contribution to the 
improvement of human relations and the advancement of 
social science because this core is rooted in established and 
fundamental concepts of the basic social sciences. 

The controversy over the nature and importance of group 
dynamics has been widespread. In fact, the cry of the an¬ 
tagonists is raised so loudly that one is tempted to echo Shakes¬ 
peare, “The lady doth protest too much, methinks.” The 
feelings of Dean Browne are shared by many critics who are 
concerned with: “the cult” of the proponents; “the uncritical 
acceptance” of group-dynamics discoveries; “the verbiage” 
in which they (the proponents of group dynamics) are im¬ 
bedded; “the pseudo-learned special dialect”; “the wild 

'Tolman, Edward C. “The Psychology of Social Learning.” Journal oj Social 
Issues, Supplement No. 3, Dec. 1949. 
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oscillation and breakdown which results from the feed-back 
correction and over-correction”; and “the diffuse vagueness 
of the literature on group dynamics." However, as Dean 
Browne urges, “It would be part of wisdom first to try t0 
understand it," and since this social innovation is established 
upon the theory and practice of a multi-disciplinary field it 
behooves the critic to he reasonably well acquainted with the 
theory and practice of these several disciplines before he 
critici/.es the emerging ideas, concepts, practice and research 
of group dynamics. 

I That of the nature and trends in group dynamics research 
and evaluation / 

The importance of research in group dynamics—inter¬ 
personal relations, if you will, is no longer a rhetorical question. 
Human relations today produce problems of major significance 
in polities, industry, medicine, and community living. Educa¬ 
tion has its own personnel problems. 

It is necessary, however, to recognize certain technical 
limitations. Let us consider, for a moment, the requirements 
of research which the workers in group dynamics studies have 
found vexing. Dr. Richard Crutchfield, writing in the American 
Psychologist' 1 for the joint Committee on Public Service 
Standards in Social Psychological Research, reported: 

In its phenomenal growth during die past fifteen years 
social psychology lias exhibited certain faults common to any 
rapidly growing field of science. There has been an unevenness 
in the quality of die research carried on and an unevenness in 
the training and competence of research workers. Moreover, 
because its problems have an immediate bearing on practical 
problems of everyday life, the applications of social psychology 
nave tended to outstrip basic research. Practical pressures 
will continue to favor the applied phases at the expense of 
basic theoretical research and methodological development 
upon which sound application must be founded. 

Dr. Crutchfield here describes one of the reasons why there is 
so much confusion about group dynamics in the minds of 
practitioners of personnel work and of the social science 
world in general. 

•In the The American Psyehotoght, IV (1949), p. 112. 
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Dr. Donald G. Marquis, 10 in his Presidential Address to the 
American Psychological Association, in 1948, also stressed 
certain difficulties in achieving the objectives of social science 
in the study of group dynamics. Dr. Marquis indicated that 
early research in psychological frontiers suffers for a lack of 
theoretical structure to guide the inquiry, of an accepted and 
adequate terminology, of “standard measurement techniques 
for the relevant variables,” and that it suffers because, as is 
true of all workers in new fields, those engaged in gioup- 
dynamics research are often “dismayed at the absence of the 
simplest kinds of taxonomic data on the materials of their 
study.” Consequently, early research reports often appear 
to be inferior and quite unrelated. 

These difficulties, described by Crutchfield and Marquis, 
prompt the kinds of criticisms cited by those who challenge the 
group-dynamics approach to an understanding of social 
phenomena, described earlier in the words of Dean Browne. 

The verbiage of every professional in-group tends to exclude 
others temporarily. The terminology of psychoanalysis and of 
atomic physics were disturbing a short time ago, yet today they 
contribute to the language of every family. It would seem 
necessary, however, for us, who are interested in the objectives 
of group-dynamics, to guard against any exclusion by associa¬ 
tion or communication if the positive and constructive con¬ 
tributions this vigorous field of endeavor has to make to 
knowledge and skills in interpersonal relations is to be realized. 

Those who are disturbed by the development of interest 
in group dynamics must likewise examine carefully these 
developmental manifestations, so that they may gain proper 
perspective in the examination and use of the knowledge 
that is constantly being developed. The immediate objectives 
of research in group dynamics are to develop: (a) a respectable 
theoretical structure to guide their inquiries; (b) an acceptable 
and adequate terminology; (c) standard measurement tech¬ 
niques for the relevant variables; and (d) the collection of 
adequate taxonomic data. These are difficult tasks which 
require time and the integration of several heretofore appar¬ 
ently unrelated disciplines and frames of reference Those 

10 In the The American Psychelogiu, III (194.8), p. 431. 
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involved in the development and application of applied social 
science research findings concerning group dynamics either as 
critics or workers will have to proceed with vitality tempered 
with common sense and self-discipline. Recently a Dean of 
Students remarked: 

We are establishing a faculty-student advisory program. We 
do not know how to group the advisees. We do not know how 
to introduce the available faculty advisots into the groups so 
that the group will achieve maximum productivity with respect 
to their needs. How can one form groups and include a faculty 
advisor when there is such a wide variance in individual 
capacity, experience, expectation, and in basic personality 
structure? 

Of course, we can use the traditional unitary systems of 
grouping, such as intelligence or age or problem, but this is 
neither effective nor realistic. Heterogeneity is a conspicuous 
and common characteristic of group relations. It is always 
necessary to learn how to work with people who are different. 
It is naive to educate and to speculate or to rely upon the 
possibility of always being able to work with those who are 
of the same color, the same religion, the same values, and the 
same basic capacities. The problem of this particular personnel 
administrator is to help the faculty advisers and the students 
to learn how to work together with their differences. The 
problem of grouping is made difficult by a number of compo¬ 
nent problems. Dr, Morton Deutsch of the Research Center 
for Human Relations of the New School for Social Research, 
has described in detail the influence of competition and co¬ 
operation on group process and development. 11 Another often 
neglected aspect of group experiences is the psychology of 
learning. In this group guidance situation, learning is an 
important factor. Dr. Herbert Thelen, Associate Professor of 
Educational Psychology, University of Chicago, has made a 
worthy contribution to understanding this aspect of group 
process in a recent review of ”Group Dynamics in Instruction: 
Principle of Least Group Size.’' 1 ’ As the title indicates, this 

11 Peutach, Morton, "An Experimental Study of the Effects of Cooperation and 
Competition Upon Group Process. 1 ’ Human RHationt, 11 ( 1949 ), i99~ 2 3>- 
11 Thelen. Herbert A„ "Group Dynamics in Instruction: Principle of Least Group 
Size. 11 Thi School Rroitw, XVII (1949), 159-148. 
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study also treats with the importance of a desirable number of 
participants in a working group. Learning to work together is a 
prerequisite of group-problem solving which no idyllic platitude 
of brotherly love will satisfy. The hazy generalizations with 
which we often diagnose and prescribe for many student 
personnel problems are rarely specific enough to resolve the 
knotty problems of interpersonal relations. 

A director of student activities who has responsibility for 
student housing states, 

We have practically no social communication between the 
dorm students and the fraternity groups. Our campus has 
achieved no group standards which are commonly accepted, 
and one result of this lack of communication and lack of 
standards is a lack of morale and cohesion in the larger and 
common educational and developmental program. Our rules 
and regulations won’t work. 

In fact, without careful and respectable research there is no 
easy or fruitful answer to this problem. Personnel administrators 
will do well to consider seriously the types of research studies 
now being carried out in the area of group dynamics and 
interpersonal relations. 

These situational problems indicate the need to understand 
the basic dynamics of the interpersonal relations but they also 
emphasize the need for a specific type of research and applied 
skill training in the professional training of student personnel 
administrators. Education and training in applied social 
psychology, group dynamics, and action-research skills, neces¬ 
sary for effective change, are too often luxury items or after¬ 
thoughts in the formulation of a program of professional 
education. The importance of research skills has often been 
minimized in the graduate preparation of our personnel 
administrators in higher education. If college personnel 
administration is to continue to be a respectable and sturdy 
profession more attention must be given to the development 
of student personnel policies and procedures substantiated by 
appropriate and reliable research. 

“After a year and a half of study a committee of five members 
of Princeton University faculty released today a 7,000 word 
report on the state of undergraduate faculty relations at 



501 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Princeton.” Reported in the Sunday New Tork Times on 
March 19,1950, this article described “tensions on the campus ” 
the need for “students and faculty to know one another 
better,” the need for “undergraduates to forget ‘the fear of 
apple-polishing’ and to take the initiative with faculty members 
to recognize the obligations of a kind of campus citizenship 
not unlike their civic obligations in the community.” It is 
necessary for the college personnel administrator to have at 
his command the methods and the skills of research which will 
enable him to isolate the basic problems, review the knowledge 
of such problems in other spheres of influence and interpersonal 
relations, initiate sound and revealing procedures for pre¬ 
liminary observations, construct a reasonable theory of 
causation, and verify such assumptions through the application 
of procedures which promise some reasonable assurance of 
reducing the tension. 

The Phelps-Stokes Fund recently released a report 13 of the 
needs of some 400 foreign students from Africa. These students 
came here inspired with a desire “to aid their homelands 
toward independent status or simply to better the lot of their 
fellow-countrymen.” Patterns of segregation and discrimination 
in American colleges and universities embittered these hard¬ 
working, self-sacrificing students who came to American 
colleges because they offered a wider range of courses and 
experiences. This is a personnel problem of our native born 
American Negroes, Jews, Catholics and other cultural groups. 
To ignore or give lip service to theoretical democracy without 
doing something practical about the problem will contribute 
further to our national and international discord. This is a great 
opportunity, and much is being done to analyze the problem 
and to meet it in a concrete manner, but it is with just such 
problems that those involved in group dynamics studies are 
concerned. 

Is it not an appropriate moment for college personnel 
administrators to seek financial support for a series of research 
studies of the most pressing college personnel problems which 

“This report was made public by the offices of the Phelps-Stokes Fund, 101 Park 
Avenue, and was prepared under tne leadership of Dr. Ruth C, Sloan of the State 
Department and Ivor G, Cummings of the Colonial Office. Dr. Chamiing Tobias 
reported that the project and report were strictly private and not official in character. 
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are essentially problems of interpersonal relations and group 
behavior? Requirements of such research work demand a 
team of specialists in group relations, social psychology, 
personnel administration, mental hygiene consultants and 
others oriented to research procedures and also to practical 
problems in student personnel administration. It will be 
necessary to delimit many of the problem areas and to recognize 
that many of these pressing problems are essentially those 
which have to do with group relations or with the potentialities 
of the group as an appropriate medium for the satisfaction of 
certain student personnel needs. 

Psychological research within our time has accomplished 
respectable achievements in comparative psychology, physi¬ 
ological psychology, in the psychology of learning, of mental 
abilities, and in social psychology involving political science, 
sociology, anthropology, and economics. The frontiers of 
research in human relations and of group dynamics are beyond 
the daily practice of most of us. 

Kurt Lewin, a well-spring at the frontier of research in 
interpersonal relations, described in his last writing the cutting 
edges of research in group dynamics. Those who have carried 
on this work have been striving to reduce the unknown. Lewin 
was certain that “the scientific aspects of this development 
(e.g., group dynamics research) center around three objectives: 

(1) integrating social sciences; (a) moving from the description 
of social bodies to dynamic problems of changing group life; 
and (3) developing new instruments and techniques of social 
research.’’ 14 

The student-personnel administrator can recognize that the 
student and faculty society with which he works provides a 
challenge for such study. Certainly the dean, adviser to 
students or director of student activities, as well as the psycho¬ 
logical counselor, need at all times to remember the functional 
importance of: (1) the social environment and the sociology 
of the community in which the individual lives and works; 

(2) the individual differences in behavior and especially in 
responding to the environment; and (3) the cultural mores 
and standards which influence the needs of the individual and 


u l-ewin, Kurt Human Relations, I (1947), J. 
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the dynamics of the social units of the collegiate society of 
which the face-to-face group is a basic structure. We can 
utilize the basic knowledge of sociology, individual clinical 
psychology and cultural anthropology, but we must learn to 
apply this knowledge in situations of reality of which we and 
our students are a part. The nature of individual behavior 
and of the behavior of campus groups as groups is constantly 
changing. No complacent, static concept of the nature and 
function of these social units of the college will suffice. The 
personnel administrator must develop more critical procedures 
for the examination of the social phenomena of student and 
faculty life. The approach to such understanding utilized 
in the phenomenological approach to social and group ex¬ 
periences is indeed promising and challenging. Dr. Robert 
B. MacLeod, 18 Head of the Department of Psychology, Cornell 
University, has described certain professional needs which 
many personnel administrators consider of grave importance 
to the development of our professional skills. The first is the 
need for a systematic procedure of observing and describing 
the characteristics of experiences of people in groups. The 
second is the need to suspend many, if not all, of our naive 
assumptions as to the underlying mechanisms which prompt 
the behavior of people in groups. Third is the need to develop a 
set of principles by which it will be possible to determine 
what is happening in our college social and group life, how it 
occurs, when and where. Ultimately we may know why, 

w Maclcod, Robert B., "A Phenomenological Approach to Social Psychology .* 1 
Journal of Ptjchalopcal Rroirwi, LIV ( 1947 ), J93- 



THE CREATION OF AN EFFECTIVE FACULTY 
ADVISER TRAINING PROGRAM THROUGH 
GROUP PROCEDURES 

IRA J, GORDON 

Associate Professor and Counselor, Counseling Bureau, Kansas State College 
Manhattan, Kansas ' ’ 

You may recall that last year, at the convention, Dean 
Maurice Woolf of Kansas State left us with the statement: 
"A blissful unawareness of the impossible is all you need,” 
in order to venture forth on a faculty advising program. He 
further laid down for us some basic concepts underlying an 
approach to faculty cooperation. At that time I was a faculty 
member at Kansas State attending the sessions as an on-looker. 
This summer, after joining the Counseling Bureau, the author 
felt that there was a chance to put these concepts into effect 
to a degree beyond which they had been tried. So to speak, 
we decided to demonstrate the efficacy of the concepts, and 
our beliefs in the value of faculty participation in advising. 
Thanks to Dean Woolf’s groundwork, we had a faculty advisory 
program, and a faculty group of 250 who were involved in it. 
The problem that presented itself was the utilization of these 
advisers so that they could function effectively at their work 
with freshmen. The nature of the situation was such that these 
people, to a great degree untrained, had to be exposed to a 
training program of a dynamic nature over a relatively short 
period of time—the Fall Semester. 

Faculty advisers were spread out over virtually one-third 
of the staff in all of the various schools. These staff members 
were responsible each for a small number of freshmen, usually 
ranging from six to ten. These faculty members were ill-trained, 
and many of them were new or had had no training, This 
difficulty was created by an old-line feeling on "the hill” that 
advising required no skill; that any intelligent professor can 
give “good advice” to students. There was also the feeling 

Sos 



506 educational and psychological measurement 


that college students should be mature, and should not requiie 
help. Some felt that students have no problems other than 
vocational choice. 

The Counseling Bureau did not exert administrative control 
over the advisers. They were under the control and pay of the 
academic deans and their names were furnished to the Bureau. 
Therefore, there was no direct line of authority between the 
Bureau and the advisers. The former functioned as liaison 
and as the data-supplying agency. The Bureau had, before 
September, 1949, attempted to provide some minimum of 
training, mostly through from one to four short lectures 
covering skills in test interpretation, concept of the counselor’s 
role, specific information, etc. (This was rather limited in scope.) 

The advisers were on the job. They had been given the 
cumulative folders, they were seeing students, and many of 
them felt that they could not make use of the information 
the Bureau was furnishing them. There was a strong need for 
holding the cooperation of the faculty gained over the last 
few years, and a strong need to move the program forward on 
more solid ground. With this in mind, the author, with the 
support of the Dean of Students and the Counseling Bureau 
staff, decided to institute a volunteer training program for the 
faculty, using small group procedures as the method of 
instruction. 

Our major philosophy governing the operation of these 
groups was that of democratic group procedures. We desired 
to keep the situation free and permissive so that the groups 
would feel free to move along the lines they wished, and at 
the rate they wished. We desired that participation remain on a 
voluntary basis, so that those who wished to join could really 
feel in harmony with the program. We wanted the situation 
to be one in which negative and hostile feelings, personal 
feelings, could be expressed. We intended to place responsibility 
for learning in these groups where we felt it belonged—on the 
advisers. The training program, therefore, was built around 
group discussion, role playing and live experiences. 

We believed, and our experience has substantiated our faith, 
that these groups, on their own initiative, would cover the 
areas that they considered the most significant to them, and 
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that there would not be much discrepancy between what the 
professional counselors considered important, and what the 
advisers considered important. On this belief, we did not 
intrude our ideas on what should be covered, or attempt to 
indoctrinate the participants with any one counseling point of 
view. We felt our role to be that of supplying resources when 
asked, giving aid on how to discuss what they had decided was 
pertinent, and, after the collections of people had become 
groups, to participate as members in the full sense of the word. 

This idea was presented to the faculty advisers at a meeting 
on September 7, 1949. The basic philosophy behind the program 
was included in the presentation, as well as a partial list of the 
possible areas to be covered. Ninety-seven advisers, including 
many department heads, and all save one of the assistant 
Deans of the various schools on the campus volunteered to 
participate. A breakdown of the figures reveals the following 
information: 65 per cent of the advisers in the School of Home 
Economics attended sessions, 41 per cent of the advisers in the 
School of Engineering, 34 per cent of those in Agriculture and 
24 per cent of those in Arts and Sciences participated in the 
first semester. 

They were divided into training groups on the basis of the 
amount of time they had free, the length of time they wished to 
participate, the hours of the week available, and the depart¬ 
ments in which they taught. An attempt was made to make 
the membership of each group a heterogeneous one, so that 
there could be a free exchange of ideas and information among 
the people with varied backgrounds and training. 

Three groups, consisting of a total of thirty members, met 
for one session a week, Three, consisting of thirty-five members, 
met for one session every other week. The author was the 
resource person for these six groups. Two groups, totaling 
eighteen members, met once a week for five meetings before the 
advising period, and one meeting during the period. They 
worked with Professor Paul Torrence, Bureau Director, 
These people, because of time pressure, decided to meet for 
this short length of time. Two groups, of thirteen members, 
met once a month with Miss Dorothy Mitchell of the Bureau, 
They held a few extra meetings. 
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As in all situations, there were several limiting factors. It 
was not possible to arrange for any financial or released-time 
incentives for participants in the training program. Indeed 
all faculty advising is “extra” work without financial compen¬ 
sation. The utilization of time proved to be a difficulty. Meeting 
times had to be arranged to suit most participants, and some 
who wished to join were unable to do so. 

Since the College serves the entire State of Kansas, many 
had to miss meetings because of extension or other obligations. 
Faculty members are often used by state and local agencies in 
consulting, judging, and other roles. Many attempted to 
attend other group meetings to make up, but there was some 
loss of continuity and group unity because of this. 

There were some powerful positive factors in operation that 
more than counterbalanced the above limitations. The faculty 
advisers felt that such a program was needed, many felt that 
they were inadequate. There was a feeling, more covert than 
overt, that such a program could contribute to personal 
growth as a teacher and as an individual. 

The students, through their planning conferences, had 
made recommendations that faculty counseling was essential, 
and that advisers should he trained. The deans of the respective 
schools were interested in the creation of the program. All 
concerned displayed a strong spirit of cooperation. 

One difficulty that presented itself after the program had 
started, and one that we had anticipated, was the normal 
one that arises when any group of people, competent in their 
own fields, are called upon to undertake new learnings and to 
use new procedures quite removed from their own. For example, 
many of the advisers have come from the physical-science and 
technological areas where they have long been trained in 
individual research, and where they have conducted classes 
on a lecture as well as laboratory basis. Group thinking and 
group processes were an essentially trying experience for many 
of them at the beginning. The author feels, however, that by 
the use of process observers, and by the resource person from 
the Bureau expressing these “trying” and negative feelings 
when they arose, that this difficulty was mostly overcome. 

What did the groups discuss and do? The following is a list 
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of topics, discussed by at least two-third of the groups, and 
selected out of the protocols: 

1. Test Interpretation 

(a) Meanings of tests, test scores 

(b) How to apply the information, interview techniques 

2. Philosophy of Education 

(a) Who should go to college? 

(b) Responsibility of College toward student 

(c) General education 

(d) Curriculum construction 

(e) Entrance requiiements 

3 The Problem of the Marginal Student—low aptitude, low 
ability, high level of aspiration 

4. The Role of the Faculty Adviser 

(a) Responsibility to institution, to student, to self 

(b) Relationships with students 

5. The Pioblem of increasing student contacts 

(a) Use of social gatherings, called by adviser at home 

(b) Use of upper-class students as group leaders of 
Freshmen groups (Home Economics School) 

(c) Other schemes 

(d) Where does responsibility for initiating contact lie? 

6. Teaching Methods 

(a) How do you teach students to accept responsibility, 
think critically? 

(b) How do you create student interest? 

(c) Grading and testing 

(d) Group Procedures 

(e) Student rating of the faculty 

7. The Dynamics of (Student) Behavior 

(a) Discussion of specific cases 

(b) Role-playing-dramatized interviews 

This represents only those areas covered by most of the 
groups. There were some groups which covered other topics, 
including the mental hygiene needs of instructors. On the whole, 
an analysis of the protocols and process charts shows a great 
deal of involvement in the program, many new ideas advanced, 
and much interchange of information among the members 
from the different schools. 

No program would be complete without attempts to evaluate. 
This process is still going on. Our first thoughts on evaluation 
included a pre-test and post-test battery, to measure several 
aspects of the program. The author created, and administered 
to the groups, a pre-test inventory consisting of three parts. 
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There was no opposition on the part of the faculty after the 
purpose of the battery was explained, and adequate safeguards 
taken to insure comparative anonymity. The first part was 
designed to measure the individual faculty member’s concept 
of what his role is in counseling, and his attitudes toward 
students. This part was a sentence-completion exercise of 
twenty-five items. We are still evaluating the returns on this, 
and the entire Bureau staff is rating the answers in an attempt 
to cut down on the limitations of such a projective device 

The second part was an information exercise, and the third, a 
miniature case study. The questions raised in the latter were 
revised from Strang’s list in Counseling ‘Technics in College and 
Secondary Schools, and were designed to be useful in training 
as well as evaluation. This was included on the theory that 
a utilization of knowledge in an organized, integrated fashion 
is necessary for effective counseling of the type done by 
advisers. 

Only the Sentence Completion Test was re-administered at 
the end of the semester. It was felt that the other measures 
were too static and not valid in this type of program. The 
third part was used for discussion and as role-playing material 
in some of the groups. 

In addition to this test procedure, reports of the content and 
process of the meetings have been kept, with a member of the 
group acting as process observer, using mimeographed material 
as a guide, and keeping a participation chart, while the resource 
person acted as recorder. Readings of these protocols show 
movement and positive changes and will be used to show 
growth in concept and understanding. Some recordings of 
role playing and discussion were made, and these will be 
used, too. 

At the end of the semester, the author created an evaluation 
questionnaire that was sent to all the participants. The analysis 
of returns is still in process, but the evidence tends to show that: 

1. We have a firm base on which to build additional training 
programs at Kansas State and other comparable institutions. 

2. The program has had repercussion in the classroom teaching 
of the participants. 

(a) Group dynamics procedures have been adapted for 
classroom use ana experimentation in classes such 
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as senior mechanical engineering laboratory, fresh¬ 
man classes in personal health, classes in journalism, 
education, foods and nutrition, industrial manage¬ 
ment, and others. 

(b) A concern for - r 1 — ■ 0 f the behavior of 

students has ■ . . md other teaching 

procedures. 

3. Relationships with the Counseling Bureau, and use of its 
facilities by faculty have increased. 

4. Advisers feel more adequate in their handling of test 
data, and have made use of the learning in interviews with 
students. 

5. The advisers feel that they now recognize that more les- 
ponsibility must rest with the student, both in counseling 
and in class work. 

The evaluation by the faculty also shows that they gained 
much from the heterogeneous make-up of the groups, from the 
method, and from the total approach. Not all was sweetness 
and light, however. Of course, some faculty members, because 
of their own personality, or because of their long years of 
training, felt that such group procedures did not meet their 
needs. Some felt they had come for the facts, and that they 
did not get them presented: one, two, three. One wrote on his 
evaluation sheet: “I went into the program in order to have 
some expert counselors give me some information on counseling. 
. . , I was not interested in serving as a guinea pig for an 
experiment in group psychology. If you ever decide to give the 
advisers some pointers on counseling they can use, I shall be 
glad to participate.” This faculty person attended only one 
session and withdrew. He represents an extreme minority. 

Although the evaluation process is incomplete at this time, 
the Bureau members feel that the program has been a success. 
One further indication we have is that five groups are going 
strong in this second semester. We had decided to terminate 
the program before Christmas, but none of the groups decided 
to do so. These present groups are all meeting once a week, 
because we found that to be the best arrangement in our 
situation. We found that the groups which met each week far 
outdistanced the others in terms of group unity, content 
covered and all-round participation and satisfaction. 

We in the Bureau know that we have learned much from 
the advisers, and have gathered many excellent suggestions 
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from them, We know that our knowledge of group procedures 
has grown greatly from the experience, We have learned much 
about our role in such groups, and about the faculty expecta¬ 
tions of such a role, We are now using that learning in the 
spring groups, Perhaps it would be more exact to call this a 
"cooperative learning program" rather than a faculty train© 


program. 

We feel that the use of the knowledge of small group dy¬ 
namics in creating and operating a large-scale training program 
for advisers is practical and successful, and that it can be 
applied effectively in other institutions, We believe such a 
program rests upon the extension of the application of personnel 
techniques by the counselor to the faculty, If the counselor 
respects his faculty colleagues, works with them in a democratic 
fashion, and attempts to meet their needs, he can secure faculty 


cooperation and participation in advising and training. 



A GENETIC STUDY OF SOCIALITY PATTERNS OF 
COLLEGE WOMEN 

DAVID S, BRODY 
Montana State University 

Introduction 

The present research represents an exploratory study of 
some of the underlying factors determining sociality patterns 
of 140 freshman college women living together in a dormitory 
residence at Montana State University during the academic 
year 1948-49. Sociometric data employed at the residence 
halls in reassigning roommates after a period of six months 
were utilized as a criterion of sociality. Each girl was asked to 
list the names of all girls in the dormitory she would like to 
have as roommates as well as the names of girls she did not 
desire as loommates. Since the girls knew that the data would 
be actually used for room assignments, maximum cooperation 
was secured. During the first week of the Spring Quarter, 
after the girls had moved to their new rooms, they were asked 
to rate the other girls in the dormitory on three traits: leader¬ 
ship, social qualities, and work habits. In addition, each girl 
filled out an inventory indicating the extent of participation in 
various home duties and in individual and group activities 
prior to her entry in college, She also filled out a Questionnaire 
pertaining to the parents’ attitudes and their supervision of 
activities prior to her entry into college. Data on the Minnesota 
Multiphasic Personality Inventory , which was administered 
at the beginning of the academic year, were also utilized in 
the study. 

Additional data on participation in individual and group 
activities in college and a measure of student satisfaction with 
college life were secured toward the end of Spring Quarter. 
However, these data have not as yet been analyzed. 
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Results on Item Analyses 

The initial step in the analysis of data consisted of tabulating 
the number of times each girl was accepted as a roommate 
The number of acceptances for each girl ranged from i to 17 
with a mean of 7.6 choices. 

On the basis of this tabulation, two groups of 30 girls, each 
representing approximately the lowest and highest 10 per cent 
in the distribution, were isolated. 

For purposes of discussion these groups will he referred to as 
the “Unaccepted” and “Accepted" groups. Each girl in the 
unaccepted group received four or less choices as a roommate 
and each girl in the accepted group received ten or more 
choices. 

Employing these two groups, an analysis was made of each 
of the items on the Inventory and Questionnaire. It was 
found that a number of items did differentiate between the 
two groups at the 5 per cent level of probability or better. 
Altogether, a total of 0 .41 items were employed in this explora¬ 
tory study and approximately 10 per cent were found to be 
significant. 

Items included in the Inventory indicating the extent of 
participation in various activities prior to entry in college were 
classified under three headings: 

Part I. Participation in individual and informal group ac¬ 
tivities. 

Part II. Participation in home duties. 

Part III. Participation in formally organized group activities. 

Each item in Part I and Part II was checked by the student 
in terms of frequency. (For purposes of item analysis, three 
categories were employed—namely, none or little, some, and 
much or very much .) In Part III pertaining to formally organized 
group activltes, four categories were employed: 

(a) no participation 

(b) member in name only 

(c) participating member 

(d) officer or committee chairman 

Analyses were made separately for each category within 
an item. 



SOCIALITY PATTERNS OF COLLEGE WOMEN 


515 


In Part I, the following seven items showed significantly 
greater participation on the part of the accepted group: 

(a) attending movies 

(b) swimming 

(c) going out on dates 

(d) touchball 

(e) hiking 

(f) social dancing 

(g) visiting friends 

Significantly greater participation on the part of the un¬ 
accepted group was shown by the following two items: 

(a) playing checkers or chess 

(b) reading 

In Part II, all of the significant items showed greater partici¬ 
pation on the part of the accepted group. These items are: 

(a) selected new clothes for myself 

(b) laundered 

(c) made my own bed and straightened out my room 

(d) painted (furniture, walls, etc.) 

(e) canned fruits and vegetables 

(f) cleaned house 

(g) washed and wiped dishes 

(h) chores around barns 

(i) worked in fields (ploughing, sowing and harvesting) 


Similarly in Part III, the items yielding significant differ¬ 
entiation showed more participation for the accepted group. 
These items are: 


(a) student government 

(b) high school fraternity or sorority 

(c) school athletic team 

It should be emphasized that a number of other items showed 
consistent differentiation between the unaccepted and accepted 
groups for each of the categories, but fell somewhat short of 
meeting the 5 per cent level of significance. (Data for these 
items will be presented in a subsequent paper.) There would 
appear to be an important difference in the extent of home 
responsibilities between the two groups of girls. In general, 
girls in the accepted group reported that they fulfilled home 
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responsibilities to a much greater extent than girls in the 
inaccepted group. 

The Questionnaire on Family Background included items 
designed to indicate parents’ attitudes and their supervision of 
activities prior to entry in college. The first section of this 
Questionnaire consisted of 42 items, twenty-one of which 
pertained to the father's attitudes and the remaining twenty- 
one to the comparable attitudes of the mother. 

Of this group of items, a significantly greater proportion of 
the accepted group indicated that: 

(a) My father provided me with a regular allowance 

(b) While attending high school, my father expected me to 
participate in social activities 

Whereas, more of the unaccepted group indicated that: 

(a) My mother expected me to work for pay outside the home 

(b) My mother tried to push me ahead and to make me excel 

(c) My mother emphasized the importance of good manners 

(d) My father selected clothes and other personal articles 
for me so I wouldn't make mistakes 

Included in this section were 13 additional items measuring 
parent-child and sibling rapport. These items were adapted 
in part from Terman’s study 1 on the prediction of marital 
happiness. Girls in the unaccepted group indicated a sig¬ 
nificantly greater degree of conflict both with their fathers 
and with their mothers than did girls in the accepted group. 
They likewise showed a greater amount of conflict with their 
brothers and sisters. 

Another series of items on family background was designed 
to indicate the type of control exercised by the parents relative 
to 21 different areas of activities. The students were asked to 
check the type of control exercised by the mother and father 
separately. 

The items appear to indicate less stringent control for girls 
in the accepted group. However, before generalizations can 
be drawn from the data, analyses in terms of weighted scores 

1 Terman, Lewis M, and Others. Psychological Factors in Marital Happiness, 
New York, McGraw-Hill Book Company, 1938. 
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are indicated. This step has not yet been taken. Preliminary 
analyses indicate that the significant areas are: 

(a) choice of friends of the opposite sex 

(b) going out on dates 

(c) time of coming home from dates or parties 

(d) studying school lessons 

(e) cleaning my room and taking care of personal possessions 

The last section of items pertaining to family background 
dealt with the extent of agreement between the student and 
her father, and between the student and her mother, on the 
type of supervision of the same ai activities listed under 
parental control, Girls in the accepted group showed generally 
greater agreement with their parents than did those in the 
unaccepted group. 

In summarizing the data on item analysis, it is significant 
to note that certain items appear to yield consistent differ¬ 
entiation between the unaccepted and accepted groups regard¬ 
less of the context in which they arc found. For example, 
items concerning home duties, especially those involving 
personal responsibilities, differentiated between the accepted 
and unaccepted groups in the activity inventories, in the 
Questionnaires on the parents attitudes and their supervision 
of activities, and in the (Questionnaire designed to measure 
extent of agreement between child and parent. Likewise, 
items concerning association with the opposite sex differ¬ 
entiated between the two groups of girls on the activity 
inventories and on the Questionnaires. 

Results on Ratings of Leadership, fFork Habits, and Social 

Qualities 

As was indicated earlier in the paper, each girl was asked 
to rate all the other girls in the dormitory on leadership, 
work habits, and social qualities. The students were instructed 
to place a check mark in front of a girl's name and go on to the 
next name if they had had no opportunity to observe that 
girl or did not know her well enough to rate her, 

Each rating was weighted from 1 to 5, I representing the 
lowest rating, and 5 the highest. Ratings for each girl were 
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tabulated and the mean of all the ratings she received was 
computed. The mean rating represented her score on the 
particular trait. For each of the three traits the distributions of 
mean ratings were symmetrically distributed and approximated 
normality. The girls were rated most frequently on social 
qualities and least frequently on work habits. The number of 
girls who were not sufficiently well known to be rated on social 
qualities was 2.4 or 17 percent, on leadership 31 or 22 per cent, 
and on work habits 59 or 42 per cent. Thus, the girls as a 
group felt that they were least able to evaluate others on the 
basis of work habits and most able to evaluate others on the 
basis of social qualities. 

The number of acceptances each girl received was correlated 
with the mean rating on each of the three traits. The highest 
correlation with acceptance scores was obtained for social 
qualities. This correlation was .59. The correlation with 
leadership was .52 and with work habits, .20. Ratings for 
social qualities and leadership are certainly significantly 
related to acceptance scores, but it is obvious that social 
qualities and traits of leadership are by no means the only 
factors determining acceptability. The possession of good 
work habits is apparently of minor importance in the selection 
of roommates. 

When the correlations were computed between work habits 
and ratings on leadership and on social qualities, they were 
found to be .49 and .32 respectively, Thus, we find that leader¬ 
ship correlates almost as highly with work habits as with 
acceptability. 

Although there is a significant relationship between work 
habits and social qualities, it is considerably lower than 
between work habits and leadership. However, leadership and 
social qualities are highly related to each other as evidenced 
by a correlation of .85. It can be hypothesized that social 
qualities are an important determinant in the selection of 
leaders among this population, but that work habits constitute 
another variable which is significant. 

Results on Minnesota Multi-phasic Personality Inventory 

Since all members of the freshman class were given the 
MMPI at the beginning of the Fall Quarter, 1948, as part of the 
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Orientation Week Testing Program, it was decided to utilize 
these records to determine whether the accepted and un¬ 
accepted groups could be differentiated in terms of this 
personality inventory. 

In the present study, 28 MMPI records were available on 
the unaccepted group and 30 on the accepted group. On the 
keys pertaining to the specific personality variables, it was 
found that the mean scores of the unaccepted group were 
consistently higher than those for the accepted group, with the 
exception of the mean score on the hypomania scale. When 
the ‘t’ values were calculated, only one was found to be sig¬ 
nificant at the 5 per cent level of probability or better; this 
was the scale pertaining to schizophrenia. Since the schizo¬ 
phrenia scale is theoretically measuring withdrawal tendencies, 
it is the one on which we might have expected to attain the 
maximum differentiation. 

When we compare the standard deviations, we find that the 
unaccepted group is consistently more variable than the 
accepted group on all the scales except depression. When these 
differences in variability were tested for significance by the 
calculation of the F ratio, the psychopathic deviate, the 
schizophrenia and hypomania scales showed significant 
differences in variability at the 5 per cent level or better. 

The consistency of these differences in means and standard 
deviations suggests that the members of the unaccepted 
group are more likely to have personality disturbances than the 
members of the accepted group. The greater variability of 
scores for the unaccepted group is a reflection of the larger 
number of deviant scores in the direction of abnormality. 

Differences in variability on the L, F, and K scales were all 
significant at the 1 per cent level or better; the unaccepted 
group being more variable on the F & K scales and the accepted 
group more variable on the L scale. Differences in mean 
scores on these three scales were not significant. 

Summary and Conclusions 

The data which have been presented in this paper suggest 
that the student’s experiences within her family group and the 
pattern of her activities prior to entry in college are important 
determinants of her social acceptability. For example, girls in 
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the accepted group indicated significantly greater participation 
in home duties, especially those involving personal responsi¬ 
bilities. In a comparison of activity participation outside the 
home, girls in the accepted group showed more participation in 
social activities whereas girls in the unaccepted group showed 
more frequent participation in relatively solitary activities. 
The evidence also points to the fact that the parents of the 
unaccepted group tended to he overprotcctive and to discourage 
the development of independence. The girls in the accepted 
group felt that their parents encouraged social development 
to a greater degree than was true of die girls in the unaccepted 
group. 

Apparently girls in the accepted group came from homes 
in which there was less conflict and greater harmony than 
was true of the homes of gills in the unaccepted group, The 
results on the MMPI suggest that girls in the unaccepted 
group evidence a greater tendency toward abnormality. 

Reilly and Robinson* in their study of popularity among 
college women point out the importance to counselors of 
obtaining some index of the probable social acceptance of an 
entering freshman. Their report shows that the usual college 
entrance data are relatively ineffective for predicting social 
acceptability. Of interest is their recommendation that aca¬ 
demic census data need to be supplemented by more vital 
statistics from the adolescent world. Certainly, the present 
study points to the value of this approach. For the personnel 
worker it means that if he is to understand the dynamic 
factors underlying social behavior at the college level, he 
must orient his thinking in terms of the developmental history 
of the individual. 

‘Reilly, J W. and Robinson, F. P. "Studies of Popularity in College: I—Can 
Popularity of Freshmen be predicted?" F,ducational and Psychological Measure¬ 
ment, VII (1947), 67-71. 



HOW TO GO ABOUT THE PROCESS OF EVALUATING 
STUDENT PERSONNEL WORK 


WILLIAM M. GILBERT 

Director, Student Counseling Bureau, University of Illinois 

The title of this paper is somewhat misleading and needs 
to be clarified. The title implies that there is a specific process, 
that student personnel services can be neatly defined, that there 
are perfectly valid criteria for determining the effectiveness and 
efficiency of these services and, finally, that the necessary for¬ 
mula for going about the process of evaluation can and will be 
supplied in cook-book fashion. Unfortunately, not one of these 
implications is justified, There is no one most-desirable way of 
going about the evaluation process. Student personnel services 
cannot be defined at all neatly; there are no ciiteria known to 
be perfectly valid and I have no secret formulas. 

With these few positive statements I should possibly end this 
paper dramatically and sit down. However, student personnel 
services will not continue to be accepted on faith indefinitely. 
Eventually some discerning college or university administra¬ 
tor will rightly ask: Just what are the purposes of student per¬ 
sonnel service and what is the evidence that these goals are 
being attained, or how can we get this evidence? We could not 
avoid the issue even if we wanted to. 

Mr. Blaesser, in his statesmanlike and visionary address of 
last year, and after reviewing the various attempts at over-all 
evaluation of student personnel programs, faced the issue 
squarely. He emphasized: ‘‘This means a total institutional 
study of the needs of the students coming to the institution, 
and the evaluation of the outcomes of the total educational 
experiences at the institution.” 

It was rightly explained by Mr. Blaesser that it would be a 
“Shangri-la” university where such far-reaching, highly coop¬ 
erative and expensive over-all and long-range evaluation of 
higher education could be carried on. 
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In the meantime, before such a university evolves, the col¬ 
lege or university administrator is still going to have to allot 
funds to the various student personnel agencies and he is still 
going to want to know what evidence there is that the different 
objectives are being reached. He will probably not insist on 
perfectly valid evidence because lie will be one of the first to 
recognize that perfection can he aimed at but cannot be ex¬ 
pected in broad educational endeavors. And we, as personnel 
workers who are sincerely interested in the welfare of students, 
will want to know how effectively and how efficiently we are 
serving their welfare. 

It is not possible or polite for me to make judgmental state¬ 
ments about the different colleges and universities you repre¬ 
sent. However, I am sure it will not be held against me by 
President Stoddard and Provost Griffith if I make the simple 
observation that while the University of Illinois is one of the 
great universities, it is certainly not a "Shangri-la” university. 
We have our problems too. as most of the rest of you do. There 
appear to be good spots and not-so-good spots in our over-all 
student personnel program. Most of you probably have what 
appear to be good spots and not-so-good spots in your programs 
too. The desirability of some type of general evaluation is prob¬ 
ably quite clear. One of the problems is how one should go 
about this process. Perhaps, by discussing some of the proce¬ 
dures which have been used at Illinois and some of the plans 
and hopes we have, ideas for developing evaluation procedures 
which will fit your own local situation may occur to you. Con¬ 
versely, any suggestions you have for us will be deeply wel¬ 
comed. 

One of the first problems to be faced both chronologically 
and in terms of importance is that of securing general, grass¬ 
roots acceptance of any type of evaluative procedure. 

In most colleges and universities, evaluation immediately 
poses a number of serious problems which must be faced. When 
we evaluate counseling services, when we evaluate registration 
procedures, when we evaluate health services, and when we 
evaluate instructional services, we are evaluating not simply 
services, but, perhaps even more importantly, we are evaluat¬ 
ing the persons who are responsible for such services and the 
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persons who perform the services. As personnel workers we 
should probably be the first to recognize that many of out 
student personnel are not as good as they should be and that 
any evaluation of them immediately can serve as a threat to 
the individuals concerned. 

Several years ago the Student Counseling Bureau at the 
University of Illinois conducted a questionnaire study of stu¬ 
dent attitudes regarding the effectiveness of the counseling 
services they received. This Questionnaire, which went out to 
some 3000 students, was devised by members of the full-time 
psychological staff with full consideration given to suggestions 
made by the trained part-time faculty counselors who are a 
part of the Bureau staff. Nevertheless, faint rumblings of con¬ 
cern came to my ears, and since the purpose of the investiga¬ 
tion was not that of evaluating individual counselors, but the 
program as a whole, the counselors were reassured that there 
would be no individual breakdown of the findings. Nor was 
there. 

Just a year ago, in response to the recommendations of a 
committee appointed to study the problem of the recognition 
of faculty counseling, the Bureau was given the responsibility 
and privilege of making formal recommendations, as to in¬ 
creases in regular academic salary and rank for Faculty Coun¬ 
selors. The exact statements in Provost Griffith’s letter of 
March 2j, 1949, are sufficiently noteworthy to deserve quota¬ 
tion: 


This whole problem has been studied recently by a special 
committee appointed for the purpose. I am approving the 
recommendations of this committee as follows: 

1. Policy. A positive program of counseling services to stu¬ 
dents based on the best clinical and guidance practices 
has become and should remain an integralpart of the edu¬ 
cational experiences we offer to students. The persons who 
do this type of work well should be rewarded for it and 
advanced in rank and salary in proportion to their cxcel- 
lence. 

4. Rank and Salary. Recommendations for changes in rank 
and salary of personnel listed in the budget of the student 
Counseling Bureau, insofar as counseling services are 
concerned, will originate with the Director of the Bureau 
and college offices to the general administration. 
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These significant forward steps in student personnel practice 
seemed to deserve a very careful consideration of any recom¬ 
mendation for increases in rank and salary that would be made. 
Consequently, within the past several months the problem of 
improving the Director’s evaluation of the counseling effective¬ 
ness of individual counselors was presented to the group. It 
was decided that an Evaluation Committee should be elected 
consisting of two of the faculty counselors and two of the cen¬ 
tral staff members. The general theoretical and practical prob¬ 
lems of evaluating the effectiveness of counseling were con¬ 
sidered democratically but briefly at a general staff meeting. 
The Evaluation Committee then went to work and presented 
a series of recommended evaluation procedures. These were 
then discussed at another general staff meeting. 

The next step consisted in having the counselors check the 
evaluation procedures which had been recommended by their 
Committee. Their checked lists were sent in without signature. 
I should like to report some of the conclusions and recommen¬ 
dations of the evaluation committee: 

I. The committee members agreed on the following state¬ 
ments us stalling points affecting all recommendations on 

specific methods: 

I. Though various attempts to evaluate counseling have 
been reported in die literature, the validity of no 
method has been established. 

а. No single method should be used as die sole basis 
of evaluating counseling. 

3. Every method used should be on a trial basis. 

4. Training, supervision, and evaluation are inseparable. 

5. Outcomes or whatever methods are used to evaluate 
counseling may serve as guides to further training of 
counselors. 

б. We do not feel it necessary to recommend specifically 
such obvious, continuous, and informal procedures as 
evaluating counselors for regularity and dependabil¬ 
ity in attending to duties, cooperation in the work of 
the Bureau, private consultation with the Director, 
participation in staff discussion, research, and per¬ 
formance of special duties such as taking part on the 
staff programs, work on committees, and the like. 
We feel that our assignment is to suggest more formal, 
specific, objective, special-occasion procedures to sup¬ 
plement these informal ones. 

7. Morale of the staff and, therefore, of each Counselor 
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is a prime consideration in the selection and applica¬ 
tion of procedures. 

8 . Each staff member should feel free to submit addi¬ 
tional evidence (such as recordings, additional par¬ 
ticipation in fake interviews, etc.) in his own behalf 
and beyond whatever evidence would otherwise be 
used in evaluating his work. 

II. We recommend that the present methods of evaluation 
by the Director be continued, and that the staff con¬ 
sider additional methods as possible supplements. 

III. We recommend to the staff for consideration: 

i. Intake conferences. Staff members would meet in small 
groups on Wednesdays when no meetings of the entire 
staff are scheduled. A central staff member would lead 
the group Staff members would summaiize their work 
since the previous meeting. Specific problems could be 
brought before the group for discussion. Tile Director 
would divide his time among the groups. 
i, A survey of clients by mail questionnaire. This should 
cover each counselor’s entire client group for a given 
semester, with the client anonymous and the counselor 
identified on the outgoing questionnaire. We would 
suggest that a committee he appointed to make the 
Questionnaire and that the committee include those 
staff members who worked on the similar question¬ 
naire used previously. 

These recommendations received the unanimous approval of 
all Counselors who then suggested that the Questionnaire sur¬ 
vey of several years ago be analyzed further to determine 
whether the type of Questionnaire used would actually dis¬ 
criminate between different counselors. 

These few examples indicate the importance of securing the 
acceptance of the persons involved in any evaluation procedure 
and the importance of attempting to minimize any feelings of 
threat which evaluation would involve. It is assumed that not 
all threatening aspects of an evaluation procedure can be elimi¬ 
nated completely. If one attempts to eliminate all possibility 
of threat, then it is probable that a laissez faire policy will ensue 
which will result m no progress. 

Even though the evaluation of individual counselors in a 
counseling bureau presents the issue of securing the acceptance 
of evaluation in its most critical form, it is still a considerable 
distance removed from the general goal of securing acceptance 
for an over-all evaluation of all student personnel agencies. At 
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least a tentative acceptance of the desirability of an over-all 
evaluative procedure by the persons and agencies who would 
be evaluated would be desirable even before the appointment 
of an evaluation committee such as that suggested in Mr 
Blaesser’s address of last year. It would be possible, of course 
for some interested agency, such as the Counseling Bureau 
which has already carried on a self-evaluative procedure, to 
recommend to the higher administration that such a committee 
be appointed. If such a recommendation were acted upon fa¬ 
vorably, in the absence of prior consultation with the various 
student personnel services, it seems possible that unnecessary 
protests and eventual lack of real cooperation from some of the 
agencies would be the result. 

At Illinois this second step in evaluating student personnel 
work, that is, the appointment of an over-all evaluation com¬ 
mittee, will probably be approached in a somewhat different 
manner. In connection with the authority given the counseling 
bureau to make recommendations with respect to increases in 
rank and salary for faculty counselors there was also appointed, 
at the request of the Bureau, an advisory council, I quote from 
the Provost's letter again: 

An Advisory Council to the Director of the Student Coun¬ 
seling Bureau is authorized, this Council to be composed of a 
representative of each college and school. Membership in the 
Council will be on a revolving basis with members appointed 
for three-year terms. For the first year, the one-year and the 
two-year and the three-year appointees shall be determined 
by lot. A vacancy will be filled by a staff member from the 
college or school which loses a representative on account of 
the rotating membership. 

In order to set up this Advisory Council, I should appre¬ 
ciate having an early nomination from each dean and director. 

This Council has been meeting with the Director of the 
Counseling Bureau each month during the present school year. 
One of the main problems which has been considered by the 
Council is the effectiveness of the various college registration 
advisory systems. These advisory systems do not appear to be 
of equal effectiveness in all colleges, a condition which has re¬ 
sulted in the publication from time to time of critical editorials 
in the school paper, As a result, the Advisory Council recom- 
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mended to the Director that questionnaire appraisal be made 
of the various advisory systems with questionnaires being sent 
to the students affected, the advisors, and to the academic 
deans, The remainder of this paper will consist of a description 
of plans and hopes which the Director of the Counseling Bureau 
now has. 

It is hoped that before any evaluation of the advisory systems 
is actually put into effect it will be possible to secure approval 
foi an over-all evaluation of student personnal services. Specifi¬ 
cally, it is hoped that it may be possible to secure the adoption 
of both the general evaluative procedure suggested by Dr. 
Kamm and Dr. Wrenn, and of the one which Dr. Kamm will 
describe to you later today. Securing the adoption of these pro¬ 
cedures or modifications of them will possibly not be an easy 
task. It is one which probably can be accomplished, however, 
provided the various individuals concerned have time to con¬ 
sider the proposals and are given the opportunity of making 
suggestions regarding them. 

The next step would be to recommend to the higher admin¬ 
istration that an over-all Evaluation Committee be appointed. 
This Evaluation Committee should probably consist of repre¬ 
sentatives of all of the various colleges and schools as well as 
representatives from all of the different student personnel agen¬ 
cies including the Dean of Student’s Office, the Office of Admis¬ 
sions, the Health Service, the University Union which carries 
on a broad program of student activities, the Speech Clinic, 
the Housing Division, and the Placement Bureau, and possibly 
student representatives. 

The third step in going about the process of evaluation would 
naturally follow from this second step. It would seem that the 
first task of the Evaluation Committee would be to discuss the 
results of the over-all, general evaluation of student personnel 
services and then to proceed to the problem of making a more 
detailed evaluative study of those services which appeared to 
be most in need of strengthening. The whole problem of criteria 
of effectiveness and efficiency of student personnel services 
would probably concern this Committee for some time. Since 
Dr. Strang will probably discuss with you the limitations of 
various criteria on the basis of which student personnel services 
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might be evaluated, I will not attempt to examine these ques¬ 
tions with you. It seems probable that the list of criteria sug¬ 
gested in the revised brochure "The Student Personnel Point 
of View" published by Dean Williamson's Committee under 
the auspices of the American Council on Education, as well as 
other more specific criteria, such as those suggested by Dr, 
Aiken in his report to the Fourth Annual National Conference 
on Higher Education in April of last year, would be considered. 

It might be reasonably expected that the Evaluation Com¬ 
mittee, after considering the various criteria and various meth¬ 
ods of procedure which could he used, would refer the problem 
to representatives of each of the different student personnel 
services for further consideration and for recommendations as 
to specific methods anti procedures of carrying on an evaluation 
program in their own agencies. 

While there arc many possible objections to a Questionnaire 
type of appraisal of student personnel service it is one of the 
few practical and not prohibitively expensive means for securing 
some rough estimate of the apparent value of the service. There 
is one point in connection with Questionnaire surveys which I 
feel has nor been adequately emphasized and that is that a 
student's responses to a Questionnaire will necessarily be in¬ 
fluenced by the knowledge which the student possesses, not 
only of the services which are actually available but of those 
which theoretically could be made available. Thus, as part of a 
Questionnaire appraisal of any given service it would seem ad¬ 
visable to supply the student with a description of what services 
might reasonably be expected from any given type of agency. 

From one point of view, at least, it may be fortunate that 
students are not more aware than they are of some of our more 
specifically stated objectives. It might be of considerable inter¬ 
est, for example, to submit to a representative group of students 
in any of our colleges and universities the eleven objectives of 
general education recommended in the report of the Presidents 
Commission and to have the students indicate on a simple scale 
the degree to which they felt their general college education 
already had, or seemed to be, in the process of helping them to 
reach these goals. The results could be startling. 

After the more specific evaluation proposals of the different 
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student personnel services had been referred to the Evaluation 
Committee for approval, and after the evaluations had been 
carried out, the fourth step in the process of evaluation would 
then confront this Committee. This fourth step would consist 
in the Evaluation Committee’s carefully examining and dis¬ 
cussing the results of the detailed evaluation of each agency and 
of their arriving at a series of specific recommendations which 
would be automatically transmitted to the Director or person 
in charge of the specific student personnel agency. Such a series 
of recommendations should, of course, be influenced by a func¬ 
tional analysis of all student personnel services with the view 
of expanding those services which needed expanding and of 
contracting those which seemed to be over-expanded. This, as 
most of you will recognize at once, is one of the most difficult, 
and delicate, and perhaps even dangerous steps in the whole 
process of evaluation. It is my own experience that practically 
every Director of any student personnal service is firmly con¬ 
vinced that his service would improve immeasurably if it were 
only expanded. This suggests, of course, that the Chairman of 
the Evaluation Committee should be a person of the greatest 
possible diplomacy, wisdom, and ruggedness. In addition it 
would be highly desirable if the college or university Admini¬ 
stration would be able to indicate, within some fairly definite 
range, at least, the total amount of funds which might reason¬ 
ably be expended for all student personnel services It seems 
possible that a wire recording of the proceedings of the Evalu¬ 
ating Committee at this point could provide some valuable 
research material for determining the extent to which the lead¬ 
ers of various personnel services were actually interested only 
in the welfare of students. 

The next step in the total evaluation process would consist 
of repeated and improved evaluations of the various personnel 
services at regular intervals. This should prove to be a relatively 
easy task if the other steps in the process already mentioned 
have been successfully negotiated 

The final step in going about the process of evaluation might 
then consist of an over-all basic evaluation of the outcomes of 
higher education including instruction. At this point, the Evalu¬ 
ation Committee would probably have to be enlarged to include 
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other standing Committees in the university such as the Edu¬ 
cational Policy Committees, the Admissions Committee, and 
others. It seems that any university which has actually reached 
this stage in the evaluation process should have little difficulty 
in securing the large funds necessary for the over-all evaluation 
of their educational program from any one of the national 
organizations which would subsidize research. In addition, that 
college or university should he placed at the top of some role of 
honor which would he devised by the American College Person¬ 
nel Association. 

What has been said can be summarized in a few sentences. 
The way to go about the process of evaluating student person¬ 
nel services is to take account of what we know about people in 
general and to make full use of good democratic administrative 
procedures at every step in the process. If the process of evalu¬ 
ating student personnel services cannot be carried on in this 
fashion a very critical examination of the whole basic structure 
and functioning of the college or university itself needs to be 
accomplished first. 



MAJOR LIMITATIONS IN CURRENT 
EVALUATION STUDIES 

RUTH STRANG 

Professor of Education, Teachers College, Columbia University 

Evaluation is a complicated business. It necessitates (i) 
clarifying goals or objectives; (a) devising methods and instru¬ 
ments for securing evidence that each of these specific objec¬ 
tives has or has not been attained; (3) gaining information 
about the changes that have taken place in individuals, groups, 
or community; and (4) passing judgment on the "goodness” 
of the changes An excellent review of the literature was pub¬ 
lished in January, 1949, by Froehlich. 1 

The evaluation of evaluation is still more difficult. This is 
because there are so many kinds of end results and processes 
to be evaluated—the personnel program as a whole, the ade¬ 
quacy of staff, the provision of certain services, the processes 
of counseling and of group work. Moreover, these are evaluated 
for different purposes and on different levels of scientific preci¬ 
sion. For example, a teacher may use information-evaluation 
methods, such as obtaining from students a simple written 
statement regarding the effectiveness of his teaching or holding 
a group discussion of the methods used in the course. These 
suggestions for improving his teaching may be very useful in 
modifying instruction for the better even though they meet few 
of the criteria of scientific evaluation. The effective teacher 
continuously studies his students' progress toward the definite 
goals in the course. 

Despite its difficulty, evaluation of personnel work is neces¬ 
sary if the college personnel officer is to maintain his status. 
Administrators, the general public and students want to see 
results; they demand proof of the effectiveness of counseling 
and group work. 

1 Clifford P. Froehlich. Evaluating Guidance Procedures. Washington, D. C.: Federal 
Security Agency, Office of Education, 1949, 
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With the increased interest in evaluation in every area of 
education, methods of evaluation of personnel work have been 
improved. Bur because of the difficulty and complexity of as- 
ccrtaining changes produced by student personnel procedures 
there are still major limitations in current evaluation studies— 
in surveys of the propram as a whole, in evaluation of different 
services, in appraising various kinds of counseling and psycho¬ 
therapy, and in the evaluation of group work procedures. 

Surveys of the Personnel Program 

Surveys of personnel programs tend to be either anecdotal or 
atomistic. The anecdotal type are valuable in giving glimpses 
of present practice which can be appraised theoretically. They 
fall short of adequate evaluation in being somewhat subjective 
- the investigator may select the aspects that appeal especially 
to him; if his mind-set is critical, he will focus on the unfavor¬ 
able procedures; if his mind-set is favorable, he is likely to note 
the incidents that will create a good impression. Almost every¬ 
one has an unconscious bias that is difficult to recognize and 
control. 

The detailed lists of criteria on administrative leadership, 
provisions and facilities for guidance, and in-service education; 
on the preparation and qualifications of the guidance staff, 
their growth in service; the specialized services available; the 
guidance and informational services available to students; the 
counseling and placement services; follow-up studies; relation 
of guidance to curriculum and instruction; use of community 
resources —this detailed analysis of the program is very useful 
in calling attention to the possible scope of the progiam and to 
standards in training and performance. It falls short of effective 
evaluation in three important respects: 

i. It is too atomistic--it considers each item separately with¬ 
out much attention to its relative importance and relation to 
other items. For example, in a college in which the faculty- 
student load was very small, the faculty members were selected 
with reference to their qualifications for counseling, and the 
faculty adviser was the key person in the guidance program, 
the need for special personnel workers would be quite different 
from that in a college having a traditional subject-centered 
faculty. 
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2, The qualitative aspect is neglected. In two colleges, both 
reporting individual interviews with students, one might have 
interviews of a high quality, while the interviews in the other 
institution might be perfunctory and even detrimental Simi¬ 
larly, autobiographies might be used in one college to help stu¬ 
dents to gain self-understanding, and in another college they 
might increase the students’ insecurity and anxiety. In one 
college the cumulative records might be kept up to date and 
used much more effectively than in another institution. The 
check list or scale type of evaluation does not supply data on 
the important qualitative aspects. 

3, The effect of the qualifications and services on the students 
is not known; in other words the crucial question of evaluation 
is not answered, namely, “Do the procedures we believe to be 
effective really make desirable changes in students, in groups, 
and in the community?” 

In studying the personnel work in a college, little progress 
has been made in defining concretely the changes that should 
result from an effective personnel program. Last year at the 
annual convention, one large group pooled their opinions on 
this subject and listed specific changes in students’ behavior and 
attitudes, faculty cooperation, group activities, and in the com¬ 
munity, which they thought should be the outcome of person¬ 
nel work. 


Evaluation of Different Services 

Educational and vocational guidance are two services that 
have most frequently been subjected to evaluation. Much dis¬ 
satisfaction has been expressed regarding the usual criteria of 
success of vocational guidance—number of positions held, 
length of time positions weie held, reasons why person left the 
position, reports by employer of worker’s proficiency and job 
satisfaction of worker. Obviously, a combination of these cri¬ 
teria is more satisfactory than any single item. In his evaluation 
of the State Consultation Service at Richmond, Virginia, Froeh- 
lich a moved toward a more adequate combination of criteria— 
criteria of occupational adjustment and personal adjustment, 
the client’s attitude toward the counseling service and change 

’Clifford P, Froehlich. "Toward More Adequate Criteria of Counseling Effective¬ 
ness.” Educational and Psychological Measurement, IX (1949), 1J5-67. 
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>n occupation, and his preparation for the job. Admirable as this 
effort is to obtain the most accurate opinions and to apply sta . 
tisfic.il methods as a test of the reliability and validity of the 
ratings it has certain important limitations, clearly recognized 

by the investigator: 

i, The agreement between the interviewer’s and counselor’s 
rating for occupational adjustment was not as high as desired. 

:. Some of the questions are ones on which the client would 
nut be expected to have much basis for judgment, such as the 
relative value of different counseling procedures, especially as 
the client's attention was not focused on the process. 

p The interviewer's basis for rating the client’s adjustment 
was meager. 

4. Much more information is needed about the individual’s 
capacity and the environmental conditions that might make 
vocational and personal adjustment either easy or difficult for 
him, overriding, as it were, the effect of the counseling service 
per ne. 

A much more specialized aspect of evaluation of the college 
advisory system is to be presented at this meeting by Frieden- 
berg. This represents an ingenious and detailed attempt to 
have the recipients of the service evaluate faculty advisers. 
From such an evaluation the faculty adviser can obtain many 
helpful suggestions for the improvement of his services, It 
clarifies the areas in which the faculty adviser can best work, 
and indicates the need for specialized services. I he same limita¬ 
tion as was mentioned in the preceding study holds here, 
namely, the students' inadequate basis for evaluating a process 
in which they have had so little background of experience or 
study. However, the concrete cases do give the student an op¬ 
portunity to focus attention objectively on the counseling proc¬ 
ess. After having obtained this information the problem 0 
appraisal is still unsolved*. Who is right—the student or t e 
person who has studied counseling and psychotherapy? 

Evaluation oj Psychotherapeutic Counseling 

Considerable work has been done on evaluating the non- 
directive interview, Much of this has been along the lme 0 
showing increased insight on the part of the client as the inter- 
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views continue. The assumption is that insights expressed in 
the interview are in themselves evidences of adjustment and 
will affect life adjustment. This assumption has been ques¬ 
tioned. Consequently, evidence of adjustment in life situations 
over a long period of time has been considered the only valid 
measure of the success of the psychotherapeutic interview. 

Even this criterion has its limitations insofar as environmen¬ 
tal conditions may be so destructive as to prevent the good 
adjustment that might have taken place under ordinary con¬ 
ditions. Another limitation is the lack of evidence of the indi¬ 
vidual’s initial capacity for adjustment. If the client’s problem 
is deep seated, persistent, and pervasive, failure to show much 
progress could not be attributed to poor counseling techniques. 

Evaluation of Group Work Procedures 

As in the evaluation of interviews, too much reliance has been 
placed on subjective evaluation of the group work process. 
Some recent studies, however, have obtained reports from the 
participants themselves and from those who have had an op¬ 
portunity to observe them some months later. For example, 
Lippitt 3 obtained evidence of actual change in the performance 
of leaders who had spent two weeks in a workshop that featured 
group discussion, role-playing in sociodrama, and interviewing. 
Both outside observers and the members of the workshop re¬ 
ported that because of the workshop they were able to do more 
effective work with their community groups. 

'The College Evaluation Officer 

A new position seems to be emerging in colleges and universi¬ 
ties. This is the college evaluation officer, with training in meas¬ 
urement and evaluation. This work is closely related to, and 
has often grown out of, the research function of the personnel 
department. Such an officer was described by Findley in a 
meeting of the American Educational Research Association. 
This officer would render valuable advisory service to the fac¬ 
ulty in defining objectives, developing instruments to measure 
them, assisting in the collection of data, and appraising and 
interpreting the information collected. 

’ Ronald Lippitt. Training in Community Relations; a Research Exploration Toward 
New Group Skills. New York: Harper and Brothers, 1949. 
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Summary 

The major limitations in evaluation studies seem to be 
L Failure to define the outcomes of personnel work con 
cretely as desirable measurable changes in students, facuh 
members, groups, amt community. ^ 

a, A too narrow approach instead of a comprehensive study 
All of the approaches that have been used in evaluating guid¬ 
ance procedures have some value. We need to know about the 
staff ami the procedures being employed; student opinion and 
expert opinion as to the effectiveness of the procedures are help¬ 
ful; follow-up studies supply essential information on life ad¬ 
justment. The intensive study of specific techniques and the 
control-group and within-grnup experimental methods also 
contribute to our understanding of the effectiveness of student 
personnel work 

3. Mass rather than individual treatment of the data col¬ 
lected. instead of studying the data collected as a group, an 
appraisal of each student should be made individually in the 
light of his previous progress, This is the case-study approach 
to evaluation. It seems to he the only adequate way to appraise 
changes in students. It enables the investigator to take into 
account the student's capacity for adjustment to college and 
environmental conditions that may he reinforcing or defeating 
the college personnel program. A case study is made of each 
student; these records are studied individually and a judgment 
made of the student's social, emotional, physical, and intellec¬ 
tual development. These judgments may then be treated sta¬ 
tistically and checked as to reliability and validity. In the case 
study approach to evaluation the service and the research 
functions of student personnel work come together; one rein¬ 
forces the other. 



an inventory of student reaction to 

STUDENT PERSONNEL SERVICES 

ROBERT B. KAMM 
Denn of Students, Drake University 

Introduction 

Increasingly, we are becoming aware of the need for evalu¬ 
ation of our student personnel programs. Now that the peak 
veteran enrollment has passed and we are faced with somewhat 
declining enrollments and the corresponding reduction in in¬ 
come, we need, all the more, to be able to take stock of the 
quality of our services. 

Just a year ago, considerable time was spent at this conven¬ 
tion in a discussion of the evaluation of student personnel serv¬ 
ices. Dean Willard W. Blaesser, then of Washington State 
College and now with the United States Office of Education, 
spoke on the subject “The College Administrator Evaluates 
Student Personnel Work” (i). Dr. John H. Rohrer, Professor 
of Psychology at the University of Oklahoma, presented a paper 
entitled “An Evaluation of College Personnel Work in Terms 
of Current Research on Interpersonal Relationships” (4). 

A comprehensive review of the literature dealing with evalua¬ 
tion was presented by Blaesser. Reference was made to such 
studies as those of Hopkins (3) in 1925, Brumbaugh and Smith 
(1) in 1930, Williamson and Sarbin (5) in 1940, as well as others. 
As the title of his paper indicates, Blaesser dealt with evaluation 
from the point of view of the administrator. 

But what about the student? Does he think what we have to 
offer is of value? Are our various services really functional in 
his college experience? Are we supplying those services which 
really meet his needs? How about securing “consumer reac¬ 
tion” to our student personnel services? 

The above, and other questions, were asked last year at this 
convention. In fact, there was so much interest in the general 
subject of evaluation that the Program Committee has again 
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stcn fit to provide a session in which the problem may be dis- 


CUV-cd. 

A Flu .lent Reaction Form.—For some two or three years now, 
Dr. t\ Gilbert Wrcnn, Professor of Educational Psychology at 
the University of Minnesota, and I have been experimenting 
with a student evaluation form for student personnel services, 

In its various statics of refinement it has been used at a number 
of institutions with limited success* Recently, an all-out ef¬ 
fort” has been made to eliminate some of the remaining"bugs” 
and we fed chat now we may have an instrument which is 
reasonably valid and which can be functional in the evaluation 
of student personnel services. 

Actually, the form has been designed with the thought in 
mind that it might well be used in conjunction with an evalua¬ 
tion form which Dr. Wrcnn and I described in an article in 
School and Society in 194H (fi). 'Hie earlier form, entitled "An 
Evaluation Report Form for Student Personnel Services" is for 
the use of trained personnel workers and combines judgments 
with regard to institutional philosophy toward the program 
anti actual evidence of specific services. The present form, used 
j in conjunction with the previous form, should give a compre¬ 
hensive evaluation of a student personnel program m that 
\ reactions of both students and the trained personnel worker 


are utilized. . , , . c . . 

Often judgments arc made, relative to the value of a service, 

on the basis of a few students’ reactions to a question or two. 
The present form is based upon the principle that if several 
pertinent questions about a particular student personnel serme 
are asked of a sufficiently large random sample of the local co J 
population , a valid indication of the worth of the service to those 

students will be available. c 

Sixty questions, five for each of twelve different s mces, 
comprise the present form. The twelve services listed below are 
those ordinarily included in any balanced program. , , 

explanatory with the exceptions possibly of b 

Institutional Program to Student Needs and Guid 
Student Conduct,” The former illustrates the point of view 
that no institution can have an effective stu en p 
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program unless the institution as a whole is functioning in the 
interests of the same student needs that the personnel services 
are designed to serve. The five items in this area provide an in¬ 
dication as to whether or not the total institutional emphasis 
is in this direction. 

“Guidance in Student Conduct” is so stated in an attempt 
to place a particular emphasis on discipline. This emphasis is 
a counseling and learning emphasis in which students respond 
to items which indicate their sense of the justice of disciplinary 
procedures, and the extent to which discipline is a learning 
experience. If the policies relating to student conduct are con¬ 
sistent with the belief that each student who violates a regula¬ 
tion should be counseled and helped to learn from the experi¬ 
ence, with punishment following only (i) when punitive action 
seems necessary for learning and (a) when necessary to restrict 
in the event no learning seems possible, then discipline can be 
a personnel service. 

The five items in each area have been designed with the 
thoughts of achieving (i) the maxinjum coverage and (a) the 
best possible representation of the service, using a minimum 
number of questions. The items have been reviewed with the 
above in mind by various trained workers in the student per¬ 
sonnel field. 

The twelve services and a sample item for each follow: 

Recruitment and Admissions 

Do you feel that, previous to your admission, representatives 
of this institution adequately explained to you the facilities 
of this campus? 

New Student Orientation 

Do you think that this institution made you as a new student 
feel a part of it and of its activities? 

Counseling Services 

Do you feel that students on this campus who most need 
counseling are receiving such help? 

Health Services 

Are you satisfied that your campus health authorities would 
handle your case competently, in the event you were injured 
or became seriously ill? 
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Housing 

Do you fed that this institution is making sufficient effort 
to improve student housing facilities? 

Food Sereices 

As a rule, do you fed satisfied with the food served you at 
the campus cafeteria or dining hail? 

Extra-Class Activities 

Do you fed that there are enough stud?— - an d 

activities on the campus to meet the , stu¬ 

dents? 

Adjustment of the Institutional Program to Student Needs 

Do you fed that your total college or university experience 
is such as to better prepare you for intelligent citizenship? 

Student Financial Aids and Part-Time Employment 

If you were "financially on the rocks,” would you feel free to 
go to the campus financial aid service for help and counsel? 

Placement Services 

Is your placement office making sufficient effort to keep you 
informed of current employment trends and needs? 

Student Personnel Records 

Are you of the belief that you are welcome to discuss with a 
counselor all matters contained in your student personnel 
folder? 

Guidance in Student Conduct 

Will a student on this campus get a chance to explain his 
side of the ease if he is "called up” for discipline? 

The sixty items as they appear in the form have been ran¬ 
domized in order to minimize bias and to insure a maximum 
chance that each item will be answered independently. 

Administration of the Form,—It is recommended that a ran¬ 
dom sample of at least 200 students of the local college or 
university population be utilized in any study involving the use 
of this form. In order to determine the needs of various groups 
on campus, participants in the study are asked to check those 
of the following which are appropriate for them. 




REACTION TO STUDENT PERSONNEL SERVICE 


541 


Male -Freshman 

•Female-Sophomore 

-Upperclassman' 

-Transfer Stu¬ 
dent 


Major Department 


Live Off-Campus in 
Rooming House 
Live at Home 
Live in College Dormi¬ 
tory 

Live in Fraternity or 
S01 ority House 


If one is to have a sufficiently large N from which to form 
judgments when considering any one of the above areas, it is 
necessary to have a reasonably large sample with which to 
begin. 

Participants in the study may indicate “Yes,” “No,” or “ ?” 
in answer to each of the sixty questions. All items are so worded 
that if the service is functioning properly in the judgment of 
the student the “Yes” will be checked. If the service is inade¬ 
quate, the “No” will be indicated. 

The “?” is meant for use only in those cases where the stu¬ 
dent has insufficient knowledge of (or experience with) the 
service to make a “Yes” or “No” response If an informed 
judgment of the adequacy of a service cannot be made, then 
use should be made of the “ ?”. 

Students are not asked to write their names on the form— 
only to answer the questions honestly and thoughtfully. 

Scoring of the Form .—A Tally Sheet is provided which allows 
for (1) the tallying of responses to each item and (a) the group¬ 
ing of these item responses for each of the services. (Each of the 
twelve services has a maximum “Yes” score of 500 for each one 
hundred students who participate in the study.) 

Following completion of tallying, numbers of “Yes,” “No,” 
and “?” responses should be converted to percentages, using 
as the base N the total number of students participating. If one 
is considering only the responses of a sub-group, the number of 
students in that group should be used in computing the per¬ 
centages. 

If one wishes to consider only the “Yes” and “No” responses, 
i.e., only the definite judgments relative to the adequacy of the 
service, then one will need to use varying N’s in computing 
the percentages, due to the probable variation in “ ?” responses 
for the twelve service areas. 
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Interpretation <f Data.* - One can assume that the higher the 
percentage of “Yes” responses for a particular service, the more 
adequate that service likely is in the judgments of the students, 

It is suggested that the services he regarded as adequate if the 
“Yes" responses approximate two-thirds or more of the total 
responses. Oliis is an arbitrary figure and a lower or higher 
percent age may he used if desired.) If less than two-thirds of 
the participants believe that the service is adequate, in terms 
of the five aspects of the service represented by the five items, 
then that service should he examined. 

Tfhc “ V responses are to be used when there is a lack of 
familiarity with the service. The presence of even a low per¬ 
centage of such <i < per cent nr over, let us say) indicates the 
need for better lines of communication to the students. Often 
students are poorly informed as to the existence of the services 
that are provided for them. 

The presence of a considerable number of responses 
should not he interpreted to mean inadequacy of the service 
itself, but, rather, to he indicative of the need for a program 
of selling and of informing students of the services available. 
Actually, to have a strong program of student personnel serv¬ 
ices means little unless the various aspects of the program are 
known and are functional in terms of meeting student needs. 

It is to he expected that underclassmen will indicate a lower 
percentage of "Yes” responses in the area “Extra-Class Ac¬ 
tivities” than will upperclassmen. Acquaintance with, and op¬ 
portunity for participation in extra-class activities, generally 
increase the longer one is on campus. 

Likewise, the percentage of “Yes” responses in the area 
“Placement Services” should be greater for upperclassmen. 
This service is especially designed for those approaching gradu¬ 
ation and has less meaning for underclassmen. A high percent¬ 
age of"?” responses should be expected of underclassmen in 

this area. , 

The evaluator may wish to compare the percentages ot 
“Yes,” “No,” and "?” responses of one group on campus with 
those of another (for example, dormitory personnel with off- 
campus students). By utilizing appropriate tests of significance, 
one can be confident that differences found to be statistically 
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significant are real and not the result of chance errors of sam¬ 
pling. With such evidence at hand, one’s conclusions will have 
greater meaning than they would, were there no statistical 
treatment of the data. 

Finally, in the interpretation of data, it is well to keep in 
mind the goals and particular emphases of the institution. If, 
for example, the college or university provides a limited budget 
for a particular service or for the entire organized student per¬ 
sonnel program, then it is probable that there will be a definite 
ceiling on the percentage of “Yes” responses for that service or 
program. Mention is made of the above because of possible 
criticism which may be inappropriately directed at certain 
capable student personnel workers who have inadequate pro¬ 
grams due to insufficient institutional support. On the other 
hand, one must always be objective and critical of any low 
"Yes” response and examine carefully the service to see if the 
maximum is being achieved within the framework and limita¬ 
tions provided by the institution. 

Summary 

In order to ascertain the worth of a product it is well to 
question the consumer of the product. Such is true with regard 
to student personnel services. Accordingly, a student reaction 
form, containing sixty questions, five each for twelve commonly 
accepted student personnel services, has been devised. Through 
study of the proportions of favorable and unfavorable responses 
to the questions asked, one can determine certain program 
strengths and weaknesses, insofar as students are concerned. 
Use of the present form also permits one to secure data relative 
to the institution’s success in actually making known to stu¬ 
dents the student personnel program it offers. 
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the measurement of student conceptions 

OF THE ROLE OF A COLLEGE 
ADVISORY SYSTEM 

EDGAR Z. FRIEDENBERG 
Universi ty of Chicago 

Most colleges and universities provide some kind of counsel¬ 
ing service for students. These services appear to have stemmed 
primarily from two functions; the organization of a student’s 
program in such a way that requirements for degrees and for 
admission to professional schools may be met efficiently, and 
the enforcement of regulations deemed necessary by the college 
for the discharge of its responsibilities to students, parents and 
community. In many schools little connection has been per¬ 
ceived between these functions; program-planning occurs at 
registration, under the direction of the Faculty; breaches of 
regulations or aberrant and unsocial student behaviours are 
treated as disciplinary problems by the Dean of Students, or, 
even, separately by sexes in the office of the Dean of Men or 
Women, as in each case is appropriate, 

With the increased influence of psychology on professional 
education (3) has come greater insight into the unity of the 
educable personality (4, 5). As a consequence, the division be¬ 
tween emotional, disciplinary, and academic problems has been 
perceived as unreal (1, 8), Students make vocational choices 
based on fantasy or emotional tension; students fail in programs 
because of intrapunitive personality trends, hostility to au¬ 
thority, or inferiority feelings; students behave lawlessly out 
of an appetite for punishment which grows on what it feeds on, 
or in acting out fantasies so complex and deep-seated as to 
render disciplinary action, however severe, an extraneous fac¬ 
tor whose meaning is distorted by the same mechanism which 
precipitated the behaviour. Not all students do these things, of 
course, but, unless the admissions policy of a college is so in¬ 
effective or rudimentary as to admit large numbers of students 

S45 
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who arc simply too stupid to succeed, nr too poor to have time 
to study after they finish their part-time work, it is clear that 
emotional factors must be involved in most of the academic or 
disciplinary problems which do occur, whether these are con¬ 
fined to a small group of students or are prevalentin thestudent 
body as a whole. 

Even so, however, there remains the fundamental question 
of the degree of responsibility which a college has for the emo¬ 
tional welfare and personality structure of its students, and 
the administrative question of how such responsibility is to be 
discharged, if accepted. It is always possible to set up quasi¬ 
clerical Indies whose function is to excrete unsuccessful or 
unconforming students. Education is, however, a systematic 
process by which human behaviour is changed in directions 
which the student accepts and the faculty deems good and de¬ 
sirable. To limit the techniques of changing behaviour to those 
which can be applied from the lecture platform, and the effec¬ 
tiveness of education therefore to those who can immediately, 
realistically, ami maximtdiy profit by those techniques, seems 
short-sighted and int ransigent and, in many cases, cruel. To do 
so with a student body composed in large part of youngsters 
seems irresponsible. 

At the University of Chicago, whose College, as is well known, 
accepts students after the second year of conventional high 
school, no such limitation lias ever been considered. There is a 
complete student health service, extending from orthopedics 
to psychiatry. There is a Counseling Center, using client-cen¬ 
tered techniques (6), to which any student can turn, without 
charge, for assistance in “thinking through” questions with 
which he is concerned. There are conventional vocational guid¬ 
ance services. There is not, since all University facilities are 
thought of as contributing ultimately to intellectual develop¬ 
ment, a University Mortician; one can only say in defense of 
the lacuna that few students develop a need for the services of 
such an official while In residence and none has ever applied 
for them. 

Within the College of the University, and peculiar to it, is 
also the College Advisory System. This consists of a staff of 
approximately (the number varies slightly from year to year) 
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twenty advisers in the College, usually devoting from one- 
fourth to one-third of a full-time assignment to the advisory 
service, and carrying a case load averaging 110 students. While 
it cannot be said that a systematic philosophy of advising 
underlies the system, it has adhered to certain principles since 
its inception. One of these is that all advisers shall be primarily 
members of the College Faculty, devoting their major effort to 
teaching or research. The purpose of this is to insure familiarity 
with the operations of the College, so that the adviser may dis¬ 
charge his administrative functions accurately. Another is that 
students need not be assigned to an adviser of the same sex, 
the purpose of this policy being to dispel the atmosphere of 
obsession with the erotic which has characterized many student 
personnel services of a more conservative orientation. A third 
is to assign each student, so far as possible, to an adviser with 
special qualifications in his intended field of professional or 
academic specialization; but since students in the College have 
virtually no opportunity to modify their programs of general 
education so as to contribute directly to their vocational goals, 
this policy has been modified so that students are not assigned 
special advisers until they are near to the completion of their 
work in the College or have made definite plans for advanced 
study or professional training. Students are not allowed to 
choose their adviser, but, are usually, on their request, re¬ 
moved from the list of the adviser to whom they have been 
assigned and placed on the list of the adviser whom they prefer, 
or whose special academic field is the one in which they are 
most interested, if he has room for them. 

New students normally meet their advisers for the first time 
at a twenty-minute registration conference at the opening of 
the year; students are admitted to the College only at the 
opening of the Autumn Quarter. At this time the adviser 
registers the student for an entire year, and files, without dis¬ 
cretion, on the basis of placement-test results, the student’s 
program for the Bachelor’s degree. No administrative device 
has succeeded, despite much thought and worry, in making 
these conferences anything but rushed and unsatisfactory; an 
inept adviser can, during registration, infuriate, confuse, or 
frighten as many as sixty new students, although the average 
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is doubtless somewhat lielow this number. After registration 
students may sign up for a fifteen-minute appointment with 
their adviser at any time they wish, for any reason they wish 
during the period of eight to ten hours per week which the ad¬ 
viser allots for the purpose. They may also be summoned to see 
the adviser at his discretion, almost always to discuss academic 
problems. The adviser's signature must be obtained to any 
change of registration initiated by the student. 

It may be seen, therefore, that the College Advisory System 
operates in an almost totally academic context. In a school 
which admits only intellectually qualified students, and pro¬ 
vides fairly generously for assistance to those who need it, most 
academic problems, however, seem to originate in a disordered 
perception by the student of his situation and responsibilities, 
accompanied, of course, by the underlying anxieties and regres¬ 
sions which give rise to rfie need to misunderstand. There is a 
question, then, as t« how much insight into the emotional 
origins and significance of academic problems a subject-matter 
specialist can be expected to acquire in order to be most helpful 
in solving them. But there is a deeper and more controversial 
question than this on which the responsible adviser must take 
a position. In every college there arc a number of students 
whose academic success is enhanced, rather than hindered, by 
aspects of their personality which seem likely to result in great 
ultimate unhappiness. There are students who use preoccupa¬ 
tion with abstract theoretical material to distract themselves 
from personal and social inadequacies. There are students who 
seek academic distinction in order to flaunt it in defiance of a 
culture which they believe to disparage it. There are students 
who are convinced that they can only be valued because of 
their scholastic achievements, and who are ceaselessly driven 
to seek grades as copper tokens to exchange for affection at a 
very unfavorable rate. What is the responsibility of the adviser 
for the welfare of such students? Must he train himself to rec¬ 
ognize them? If he can recognize them, should he seek to ini¬ 
tiate personality changes which will probably make the stu¬ 
dent's academic record less spectacular, even if they also result 
in greater happiness and ultimately greater productivity and 
creativeness? 
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The answer to such questions depends on a complex hierarchy 
of values, which certainly cannot be established by empirical 
investigation alone. It is clear, however, that student expecta¬ 
tions of the Advisory System are one of the factors which must 
affect the decision. No administrator can build an advisory 
service in response to student demand, which is always partially 
conflicting and made in partial ignorance of the administrative 
limitations of the particular situation. If, however, a certain 
kind of service is believed by students to be a responsibility of 
the Advisory System, although no administrative provision is 
made for it, a situation which will engender hostility, and which 
is dangerous if the service is important, exists On the other 
hand, if students are convinced that a particular kind of serv¬ 
ice is not the responsibility of the Advisory System, and would 
not seek it there even if it were offered, that service can prob¬ 
ably not be offered to students effectively within the System, 
particularly if it is a counseling service which must, ultimately, 
always be voluntarily received. 

The author, therefore, sought to develop an instrument which 
would measure four things: (l) student opinion of the scope 
desirable in the College Advisory System; (2) student informa¬ 
tion about the system as it actually exists, to permit an estimate 
of the degree to which criticism and opinion might be regarded 
as informed; (3) student evaluation of the effectiveness of the 
System in solving certain problems which it recognized as pos¬ 
sible sources of weakness in itself; and (4) an indication of the 
kind of role with respect to themselves students believe an 
adviser should play in assisting in the solution of certain com¬ 
plex problems. Since this information seems to be among that 
which would be needed by any college in evaluating its advisory 
services, the instrument used to gather it will be described and 
illustrated in some detail. (Copies of the complete instrument 
may be obtained from the author on request.) It consists of a 
group of five batteries of objective questions, with space pro¬ 
vided for additional focussed written comment by students; 
the entire instrument requires something under two hours for 
most students to complete. The first battery consists of nine 
questions which elicit only vital statistics—age, position in the 
college, frequency with which student consults adviser, etc. 
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Because of the unique mode of organization of the College 
few of these questions would he applicable intact to other situa¬ 
tions, and they will not be reproduced here. Since IBM electro- 
graphic answer sheets were used, questions were numbered so 
as to facilitate analysis, and the next battery began with item 
if6. Most of it is reproduced, as follows: 

Below you will find lifted certain problem situations which 
are encountered with varying degrees of frequency among Col¬ 
lege crudrnt*. Among the reunited to which a student at the 
U. of C. might turn for ,v.M'«t;mec with each of these problem 
situations is ho College Adviser. In considering each problem 
situation, feet free to draw on your own experiences with the 
College Advisory System, or other information which you 
believe to Ik* valid, but try in every case to give a reasonably 
ptwaltzed response, based oil your conception of the system 
as a whole For each of the situations listed, on your answer 
sheet At;*- hot space 

A, if you believe the College Adviser to be the best person 
from whom to seek help in such a situation. 

B, if you believe that the College Adviser would be the best 
Vnitertity stojf member from whom to seek help in such 
a situation, though probably less effective than experts 
available elsewhere (c.g., a private psychoanalyst or firm 
specializing in vocational placement). 

C, if you believe that the College Adviser might be of some 
help in such a situation, and that you might go to him if 
you had special respect or friendship for him, but be¬ 
lieve that there arc other more appropriately trained and 
chosen University officials who could be of greater assist¬ 
ance, 

D, if you believe that some University official should be 
available to help in such a situation, but that a. College 
sidoiser, either because of deficiencies in training, insight, 
or interest, or because his responsibilities arc divided 
between the student and the institution, might be an 
indifferent or even dangerous source from which to seek it. 

E, if you cannot conceive that the University has any respon¬ 
sibility to help a student with such a problem, and do 
not believe that this student should seek help from any 
University official. 

PROBLEM SITUATIONS 

16, Student is fearful of failing his comprehensive examina¬ 
tions, even though he hits been working and has made 
passing grades in the Autumn and Winter Quarters. 

17, Student must work to remain in school, and finds that in 
order to clear enough time to keep a job, he must petition 
to get into sections of classes that arc listed as closed. 
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18, Student has stolen an automobile and later abandoned it. 
He has not been detected, but fears that he may be, and 
anxiety is disrupting his work and his life. 

19. Student is making mostly C’s, with an occasional D and 
still less frequent B. The death of his father makes it im¬ 
possible for mm to continue in school without substantial 
financial aid. 

ao. Student cannot bring himself to study; if he sits at his desk 
and attempts to do so, his mind wanders off into day¬ 
dreams. If Ire attempts to write a required paper, or other 
written exercise, the blocking is particularly intense. 

21 . Student wishes to enter medical school in the shortest 
possible time, and wants help in planning his program of 
studies most efficiently. 

22. Student is uncertain whether the qualifying examination 
in Humanities 1 (Special Art) can be taken as part of a 
sequence culminating in Humanities 3 (German) in fulfill¬ 
ment of the requirements for the A.B. degree, and if so, 
whether Language 1 is still a requirement or not. 

23. Student has gotten into serious difficulty as a consequence 
of sexual relations, and is now in a state of panic at the 
prospect of having to choose between an undesired mar¬ 
riage or exposure and parental discipline. 

24. Student, not living in a residence hall, has participated 
in a group which went to a Gerald L. K. Smith meeting 
to bieak it up. Eggs were thrown, and the student is now 
being held by the police. 

25. Student has a mild interest in becoming a lawyer, which 
is in accord with his parents’ wishes. He is not certain 
that his interest is very leal, or that he has the pattern 
of abilities which lead to success in this field, and is begin¬ 
ning to feel anxious. 

26. Student is troubled with severe headaches, of undeter¬ 
mined origin, which are making it impossible for him to 
study and causing him to fail his work. He notices that 
they arc followed by periods of listlessness and depression. 

27. Student has purchased a portable typewriter from a store 
in the University community, and signed an installment 
contract to pay for it. He has found several mechanical 
defects in the machine, and wishes to return it and get his 
money back. The store, however, threatens to sue him 
for the balance of the money. 

28. Student does not understand the process by which his 
placement has been made and wishes to have the meaning 
of his placement scores explained to him, as he feels he 
should have been excused from Mathematics 1 and Social 


Sciences 2. 

29. Student has developed a very strong emotional_ attach¬ 
ment to his roommate, who is now no longer willing to 
“pal around” with him as $i$ first^ The rpoijjp^ ^ ^ 

I L1B.1A iY # i ri.hivt .iruTimi 
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requeued a change of room assignment, and the student 
is troubled by suicidal impulses, and terrifying dreams 
which he is murdered by his former friend. 

The reader will doubtless grant that nearly every type of 
problem is represented in the battery, from the purely academic 
to the highly aberrant and clinical. These last were included 
not because an adviser is likely to encounter them but in order 
to permit students to express the most extreme demands pos¬ 
sible on an Advisory System if they wished. 

The next battery the only portion of the instrument to 
which a right answer “key” in the usual sense of examining is 
possible- consisted of twenty true-false statements about the 
Advisory System. Examples arc “Penalties may be invoked to 
compel a student to register for those College courses which 
his adviser recommends that he take during a particular year/’ 
“Most College advisers carry a case load of approximately fifty 
students,” “College advisers receive special training in the real¬ 
istic handling of the emotional problems of students.” 

The. fourth battery, consisting of 15 questions, would be 
adaptable to almost any academic situation, and is reproduced 
below in its entirety. 

In the College Advisory System, as in every administrative 
structure, the performance of the functions characteristic of 
that system is limited by problems of facilities and procedures. 
Sometimes these limitations can be overcome by ingenuity 
and special o elmpiv. i f'-:i they persist as sources of dissatis¬ 
faction to‘-(.uK .uni dicnrd- alike. 

Below you wilt find listed a series of such limitations which 
you may or may not feel apply to the College Advisory Sys¬ 
tem. In considering each limitation, feel free to draw on your 
own experience with the College Advisory System, or other 
information which you believe to be valid but try in every 
case to give a reasonably generalized response, based on your 
conception of the system as a whole. For each of these, on your 
answer sheet blacken space 

A. if you feel that this problem is almost always satisfac¬ 
torily overcome by the College Advisory System, or is 
one with which it should not be concerned anyway . 

B. if you fed that the problem is often satisfactorily over¬ 
come by the College Advism-y System, but is neverthe¬ 
less the source of occasional annoyance, 

C. if you fed that the problem is recognized by the College 
Advisory System, but is mishandled about as often as it is 
solved, or has been solved by halfway measures . 
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D. if you feel that the problem is one which may usually 
be expected in contacts with the College Advisory Sys¬ 
tem, although you are occasionally surprised by success¬ 
ful handling of it. 

E. if the problem is almost always troublesome in student 
contacts with the College Advisory System to which it 
is related, and there is no satisfactory evidence of effective 
attempts to solve it. 

61. Providing of enough time at each interview to permit 
students to complete the business for which they sought 
ail appointment. 

62. Keeping individual advisers close enough to their schedules 
that students need not wait too long for their appoint¬ 
ment, or miss class time because of late advisers. 

63. Finding persons to serve as advisers who are warmly in¬ 
terested in students and their problems, and who know 
their students as individuals. 

64. Keeping the case load per adviser low enough to permit 
advisers to get really acquainted with their advisees and 
their problems. 

65. Keeping student conference material confidential, and not 
revealing it to persons who might use it in damaging ways. 

66. Knowing accurately the right members of the University 
to whom to refer students with special problems—e.g., 
reading deficiencies, or presumed errors in recording com¬ 
prehensive results—and helping students to get in touch 
with those people. 

67. Providing office facilities which insure as much privacy as 
students need in order to discuss freely with their adviser 
such problems as they wish. 

68. Assigning as advisers persons with sufficient insight into 
the emotional and developmental tasks of young people to 
really understand what’s going on inside them. 

69. Keeping records sufficiently up-to-date, accurate, and 
available that advisers do not act on mis-information. 

70. Conveying to students an attitude of respect for them as 
people, and conducting interviews with courtesy and gen¬ 
uine friendly feeling. 

71. Getting advisers to shut up long enough to permit stu¬ 
dents to express their own feeling about problems fully. 

7 2. Assigning as advisers persons of sufficient maturity that 
they need not “use” students emotionally, by bullying, 
identifying too much with them and their problems, mak¬ 
ing demands on the student for liking or admiration, or in 
other, more subtle, ways. 

73. Providing advisers sufficiently mature emotionally to listen 
to any problem students might wish to discuss with them 
without becoming “shocked" or frightened, or attempting 
to impose standards of conduct which the student does not 
accept. 
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7 4. Scheduling sufficient hours per adviser that students can 
get !o '« an advis-cr when they need to, without having 
to wait for attention with their problem unsolved. 8 

75. Providing sufficient information on “summons" forms that 
students are not earned needless anxiety as to the possibil¬ 
ity that they may Ire in trouble. 

76. limiting the scope of the adviser’s activity sufficiently 
that students arc tint obliged to discuss with him matters 
which are not properly his business. 

The fifth battery, although it contains but five items, is pet- 
haps the most interesting in the questionnaire. It is intended 
to appraise the rale which students think it appropriate for the 
adviser to fill, and consists of fictitious case studies, each of 
which presents a rather serious student problem, followed by 
a choice of five courses of action which the adviser, confronted 
by such ,1 problem, might take. The student is asked to indi¬ 
cate the choice he believes best, and is given space for written 
comments in which to suggest other courses lie might judge 
preferable. The items follow: 

91. Student is afraid that lie will fail comprehensive examina¬ 
tions in German and Mathematics. In the course of his 
first interview with the adviser, lie reproaches himself 
severely for his failure to study, but states that, as soon 
as he begins to try to do so, his mind wanders off into day¬ 
dreams. He is n good jazz musician, and is in demand by 
many of hit. former high-school friends to lead a small or¬ 
chestra at their social events. When he agrees to do this, 
his parents attack him, pointing out that he has never 
been as smart as his elder brother, that he is wasting his 
time and their money, would probably have a hard time 
succeeding at the University of Chicago in any case, and 
must surely transfer to an easier school if he fails an ex¬ 
amination. 

The boy, as he tells this story, seems much hurt and un¬ 
certain, but is inclined to agree with the low estimate 
placed by Iris parents on his character and intelligence. 
Entrance aptitude test scores secured by the University 
place him well among the upper tenth of applicants ad¬ 
mitted, 

A good adviser would 

A. sympathetically but firmly support the parents’ de¬ 
mands on the boy, advising him to give up the or¬ 
chestra until he is more certain that he can carry 
his school work. 
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B. tell the boy unemotionally that the decisions must 
be his, but reiterate for him the precise requirements 
for continuing registration in the College. 

C. say only enough to make it clear to the boy that his 
feelings of anxiety, rejection and conflict are under¬ 
stood and accepted. 

D. sympathetically point out that the boy has a right 
to make any decisions about his total program of 
activities which will best satisfy him, while malting 
sure that he understands both the conditions under 
which he may continue in school and the real abil¬ 
ities he has been shown to possess. 

E. point out that the key to the situation is probably 
the hostility his parents feel toward him, as shown 
by their desire to underrate him, and his resultant 
fear that, should he succeed, they will completely 
reject him. 

92. Student, an eleventh-grade entrant, seventeen years old, 
has been placed on probation because of a failure to at¬ 
tend required physical education classes. She is also failing 
two of her subjects. The instructor in one of these has 
turned in a sympathetic report, indicating that he be¬ 
lieves the girl to be intelligent and creative, but too much 
burdened by her personality difficulties to accomplish 
much at this time. The other report is aggressively crit¬ 
ical, describing the girl as unkempt and lazy, and declaring 
that she has no place in the College. At the conference to 
which she is summoned, the girl appears shy, nervous, and 
so far as possible, uncommunicative. 

A good adviser would 

A. point out to her in a kindly but resolute way that 
she will surely be dropped from school if she does not 
make a better academic adjustment, and help her 
to schedule her week’s work so that she can begin to 
make effective use of her time. 

B. restate to her, in as neutral a tone as possible, the 
conditions under which her registration may be ter¬ 
minated, but emphasize that the decision must be 
hers. 

C. let her know that he understood that she must be 
feeling threatened and unhappy and express clearly 
a wish to help her understand her own feelings bet¬ 
ter, while pointing out calmly that they.must also 
meet the practical situation in which she is involved 
in order to go on working together. 

D. suggest that she drop the course taught by the hos¬ 
tile instructor, and use the extra time to catch up 
on her other work. 

E. point out to her that her unkemptness, laziness, and 
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unruipcrativc attitude are quite evidently wavs of 
rebelling against authority and are almost certainly 
derived Inmt ncr feelings about her parents rather 
than from any teal aspects of her College situation 
1 U- The program of an nth-grade entrant has been erro 
ncoirly prepared by hi*. registration adviser, who checked 
Biological and Physical Snenccs rather than Natural Sci- 
i, 2, and a 1 * requirements lor his degree, The error 
i> noted shortly itefotr the beginning of the student’s sec¬ 
ond yc.it in the College, and the student is notified that the 
requirement has be-n changed and that he must now take 
the Natural Sciences sequence. The, student has not yet 
registered for either Biolr qic.il Sciences or Physical Sci¬ 
ence'., and could not have bruun work on Natural Sciences 
i duiting the previous year because of poor mathematics 
placement, mi that lie has not, in fact, suffered as yet by 
the error. 1 is is nevertheless quite upset by the change, as 
he wishes to enter an engineering .school, believes that 
Physical Science will serve him in better stead than Nat¬ 
ural .Sciences i, docs not want to take an additional 
comprehensive, and is angry about the inefficiency of the 
adviser in making such an error. lie comes in to ask that 
the original statement of his degree requirements be kept in 
force, 

A Rood adviser would 

A. apologize for his carelessness in making the error, 
but point nut that since it has not as yet affected the 
student's program, the requirement should stand 
as corrected. 

B. state firmly that error or no error, the degree require¬ 
ments for nth-grade entrains are uniform and must 
be consistently administered. 

C. note carefully the student's reasons for wanting to 
keep the old requirements in force, then take the 
nutter to the Dean of Students in the College, admit 
that the original error was his, and ask the Dean to 
stand behind the old requirements. 

I). himself prepare an amended program for the stu¬ 
dent, reaffirming the original requirement, and send 
a copy of it to the Registrar for recording. 

E, point out to the student that it is irrational for him 
to be angry over an error which has, in fact, done 
him no harm, and try to help him to gain insight 
into the true sources of his annoyance. 

94, An 1 ith-grndc entrant has a schedule which requires that 
he take Physical Education at 1130. lie schedules a con¬ 
ference with his adviser at which he complains, with some 
indignation, that this program is not acceptable to him, 
because it Intel feres with his freedom of worship. It has 



MEASUREMENT OF COLLEGE ADVISORY SYSTEM 


557 


been his custom, since the age of ten, to read a chapter of a 
religious work daily after lunch; if he does not do so, his 
food disagrees with him, and he suffers from bloating and 
heaitburn. He believes it to be dangerous to his health to 
take exercise while in this condition, but maintains stoutly, 
and unasked, that this does not bother him at all, since he 
is prepared to meet his Maker at any time He does, how¬ 
ever, insist that, rather than risk the moral obloquy thus 
involved, he will simply refuse to attend physical educa¬ 
tion classes. There is no way to arrange nis schedule so 
that he can either lunch at 11130 or take Physical Educa¬ 
tion then without either petitioning for admission to three 
closed class sections or getting the Physical Education 
Department to make an exception to its rule and let the 
student come two days a week at 11:30 and two days at 
1:30. 

A good adviser would 

A. let the boy go ahead and petition, regardless of the 
improbability that three petitions would be granted 
for such a reason, in the nope that he might change 
his mind when finally confronted with so nearly im¬ 
personal a reality. 

B. attempt to persuade the Physical Education De¬ 
partment that the boy’s emotional need is impor¬ 
tant and real, and that it should make an exception 
in this case. 

C. say neutrally and dispassionately to the boy that the 
University does not recognize this kind of fantasy 
as religious in character, and cannot accommodate 
itself to such diversity of need; tell him frankly that 
if he does not attend compulsory physical education 
classes, he will be removed from the College. 

D. tell the student that it is pretty clear that some fac¬ 
tor besides religious conviction is operating to pro¬ 
duce symptoms of this kind, that the responsibility 
of the University to him and his parents requires 
that it insist he report to Student Health for a com¬ 
plete medical and psychiatric examination, and that 
his program may more profitably be discussed in the 
light of the report which Student Health will make. 

E. discuss with the student the religious meaning of 
his position, pointing out that it must derive from 
an unusual conception of God, and suggesting that 
he scrutinize his own emotional needs as the source 
of the conflict, 

95. A twenty-year-old student, who entered the College at the 
i3th-grade level at the opening of the previous scholastic 
yearj is making satisfactory grades, both on his compre- 
nensives at the end of his first year and on quarterly e.v- 
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Reports from hi* instructor in Humanities a 
jjiij History«,I vVn.Jcrn Civilization commend him for his 
brilliant (oatnSmnon to dwcuvuon, and his evident capac¬ 
ity S'. mteyr.irc the material offered into abstract general- 
i/atmn*. Reports, from his instructors in Biological and 
Phywa! Sciences indicate that he has hardly ever at¬ 
tended efo-scs m these courses, although he passed the 
comfwhfnsive in Biological Sciences with a grade of C. 

The student's adviser, in an informal discussion with the 
Head of the residence hall in which the student lives, 
learns, however, that the student is regarded by the Head 
as s'linewhat lacking in emotional adjustment. He has 
taken no interest in House social activities, and, so far as 
is known, has few social interests of his own. His friend¬ 
ships within the House are confined to two other boys, 
with whom hi* has discussions nearly every night center¬ 
ing on the Marxist interpretation of the motivations of 
contemporary politicians, or the unity and structure of 
contcmporaiy drama, or the nature of reality. He has 
twice been sent back to his room from rhe dining hall 
Uecau*c he came in to dinner without coat or tie. 


A good adviser would 

A. do mulling about the situation, on the grounds that 
he has no right to interfere with what evidently 
represents the (toy's free choice of behavior, so long 
as he is academically successful. 

B. summon the boy for a general discussion m the 

course of which he would expect to describe to the 
bnv in detail the ran^e of interesting activities avail¬ 
able at the University. .... 

C. attempt to show the House Head that the behavior 
of the boy might very well indicate more complete 
achievement of the objectives of the College than 
that shown by nominally better adjusted students, 
and urge him to encourage the boy’s present mode ol 
scl f^xpretfis i on * 

D. summon the boy for a conference in which he would 
cautiously attempt to estimate how happy the boy 
really was, and, if considerable anxiety and unhap¬ 
piness were indicated, try to get him to discuss the 
possibility of seeking help from the Counseling Cen¬ 
ter or a psychiatrist. 

E. summon the boy and explain to him that his P res ®" 
behavior shows serious maladjustment,_ is probably 
mote the result of his need to rebel against the pat¬ 
terns of middle-class behavior established by bis 
parents than of a serious interest in his studies, ano 
suggest that he work the problem through with the 
adviser, 
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Detailed results of the administration of the instrument will 
not be presented here, since it is hard to see how they would 
be of more than local interest. A brief account will be given, 
however, as an example of the way the questionnaire may be 
handled, and the kind of results to be expected from it. 

A letter describing the questionnaire, and stating that it had \ 
been prepared jointly by the Offices of the Dean of Students > 
and University Examiner was sent to every seventh student 
on the list of each adviser, requesting him to come fill out the 
instrument at his choice of four specified times. Since indepen¬ 
dent results were wanted, this seemed more desirable than 
sending the instrument to the student, who would, in many 
cases, have then filled it out in consultation with others. At 
the time this was done, the Chicago Maroon, the official stu¬ 
dent newspaper, editors of which had been present at all ses¬ 
sions where the questionnaire had been planned, carried edi¬ 
torials urging student co-operation. 161 students or slightly 
less than half of those who were invited, filled out the ques¬ 
tionnaire. The composition of this sample was scrutinized by 
the Dean of Students in the College, who declared it to be 
adequately representative, so far as crude statistical factors, 
i.e., length of residence in the college, age, sex, level of ad¬ 
mission, etc., were concerned. The sample could not, however, 
have been representative of student attitude, since it is quite 
clear that the large proportion of students who did not respond 
must have felt differently about the Advisory System than 
those who were willing to give it some time. One would as¬ 
sume, in the absence of more specific information, that stu¬ 
dents who felt most strongly about the system, whether posi¬ 
tively or negatively, would be likely to respond, while the 
indifferent would ignore the request; such an inference could be 
checked only by an aggressive interviewing program in which 
contact was established with a good sample of those who 
refused to co-operate, 

The results on the objective portion of the instrument cited 
in this article, for the total group of 104 boys and 57 girls 
responding, are presented in the following table For items 
16-29, the figures given refer to the number and percentage of 
students marking the item A, B, C, D, or E. For items 31-50, 
the figures are a frequency distribution showing the number of 
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students makina various total scores on this twenty-item true 
falw?" test'* of information. For items 61-75 the same informa 
tiort h given as for 16' 29,with two additions. These items as 
the reader may perceive by referring to them, constitute a rat¬ 
ing scale on which students appraise various problems which 
the Advisory System may have met more or less effectively 
Space A represents a highly favorable appraisal on a particu¬ 
lar item, space H a moderately favorable one, space C a neutral 
or ambivalent one, space I) moderately unfavorable, and space 
F, highly unfavorable. In order to provide some quantitative 
indication of the relative success of the system in solving these 
problems, the following device was invented. The number of 
students choosing to rate each item A was multiplied by 3; 
the number of students marking it K, by -3. Those marking 
it B were counted in as t, those marking it D, as —i, while 
C responses were ignored multiplied by zero. The total sum 
thus obtained was added algebraically for each item, and the 
number thus obtained is reported as a Derived Score in the 
column D.S. The Rank column simply indicates the rank of 
these scores, a low number indicating a highly favorable stu¬ 
dent response to this aspect of the service. The maximum pos¬ 
sible score would be 3 X tf>i, or 483; the minimum -483. 
Astonishingly, but gratifyingly, no negative scores are obtained. 

Similar data have been gathered for six subgroups of the 
population which took the questionnaire. These groups are: 
54 1948 entrants, who had had but a few weeks experience 
with the College; 41 nth- and mh-grade entrants aged 18 or 
younger; 31 students having been assigned to three or more 
advisers during the course of their college career; 62 students 
answering correctly eleven or fewer of the twenty true-false 
information items; 35 students blackening space E (maximally 
unfavorable) for two or more of items 61—76; and 97 students 
choosing unpopular responses - that is, responses other than 
91D, 91C, 93A or C, 94B or I), or 95D on two or more of the 
"case-study” items 91-95, 

Three different kinds of free responses were sought from each 
student. The first and major source of these was the following 
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paragraph, presented at the close of the objective portion of 
the material 

On this sheet please suggest any specific changes m the CoU 
lege Advisory .System which you believe would increase its ef¬ 
fects mess heel free to suggest any that seem important to 
you It h suggcstul th.it you miner your thinking around such 
jkmiblr arr.ii td change as, 

I Vrofcsiuri.tl jHualiliuUirins id advisers, 

1 Case h< ul ol ads isrr'j 

j Jhnpc of advisory service i c , increasing or deaensing 
the range of kinds of problems with winch adviseis deaf 
Do you firl lint advisers, as they now function, are a 
threat to freedom or privacy of students? Do you, on 
the oilier hand, feel tn.it they arc too much concerned 
with routine academic problems to ofler you the help you 
need? What changes would you suggest? 

4, lnterinmniumcalioiis between Instructors, House Heads, 
and *\dv ners 

5 Means of establishing ihc working relationship between 
student and adviser as vjuii .is possible. 

Students were also asked ro list any cluiaccenstics of the 
Advisory System not included m items 61-76 which seemed to 
them especially worth} tif favoiablcoi unfavorable comment, 
and to stale any specific course of action which they would 
prefer to any ol the <; listed, with idoronce to items 91-95 
These comments have been examined rather crucially, and, so 
far as they are subject to classification, tallied quantitatively, 
Twenty-five (of the 161) students responding to the para¬ 
graph quoted above c\pulsed a feeling that the case load of 
advisers should he limited the* must common single suggestion 
made Twciuy-lhice felt ihnt adviseis should he warmly in¬ 
terested m students, and (7 kit that moie attention should be 
given to personal pioblenis of students Sixteen students felt 
advisers should receive training in adjustment counseling pro- 
CcclurcSi or psychology, while five more also felt this to be the 
ease if emotional problems of students weie ically a part of the 
advisory responsibility, hut wcie not quite picpnied to concede 
that they were 

On the contrary, a smaller gioup of students displayed con¬ 
trary and apparently moie intense feeling Llcven students felt 
that advisory and counseling services should be kept separate, 
or that the adviser should not be concerned with personal prob- 



measurement of college advisory system 


lems, while one student put the feeling on the basis that ad¬ 
visers should not deal with pioblems which students would 
ordinallly discuss with parents Fifteen students took what 
might be termed a middle position, viewing the advisory func¬ 
tion as mainly academic, but feeling that advisers should be 
able to dnect students intelligently foi help when needed A 
similar and related contrast was apparent in student wishes 
concerning the degiee of interrelationship between advisers and 
dormitory and academic staff Fourteen felt this should be in¬ 
creased, while nine felt this to be undesirable 

Ofpaiticulai inteiest was the extent to which students con¬ 
ceived the Advisoiy System as playing an important role in 
interpreting the puiposes and values of the Umveisity of Chi¬ 
cago College Plan to them—a function which, it must be ad¬ 
mitted, was almost completely ignoied in the instiumetit itself, 
Tins feeling was expressed in a variety of ways, and is there¬ 
fore less conspicuous on the tally than it would have been had 
the attitude found expiession in a single, often-iepeated senti¬ 
ment Nine students expiessed a dnect wish for assistance m 
the synthesis and interpretation of then College learning ex¬ 
periences Six, expiessmg less positive feeling, expiessed a need 
foi more assistance in oiienting themselves to the University 
Six also wanted this help specifically in connection with the 
function of the Advisory System itself, with two expressing 
definitely the feeling that the System should more cJeaily define 
and state its own pui poses and limits, Some anxiety was ex¬ 
piessed at the failure of the College to take students’ vocational 
ambitions adequately into account, five felt that special ad¬ 
visers trained in a professional field, eg, for pie-medical stu¬ 
dents, should be assigned as needed, four, that more help should 
be given the student in making plans to entei a Division on 
completion of general education, There was suipsisingly JittJe 
complaint about the non-voluntaiy natuie of the system, only 
four students wished to be allowed to choose their own adviser, 
while nine asked that regular meetings be scheduled at inter¬ 
vals, legardless of then felt need, in order to check on their 
progress. 

Few items were added to those in 61-76 of the instrument 
by student commentators, and none by more than four persons 
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Rather curiously, four students stated here a belief that stu 
dents should have an adviser of then own sex; three wanted 
more rrehnK.il advice about planning for a vocation, three 
felt that advisers should he commended for trying to help stu. 
clous with personal problems, and should do more of it; and 
three felt that warmer, friendlier relationships would be desir¬ 
able, two wanted more psuholoi'ical or clinical training On 
the other hand, three students felt that advisers should stick 
to inipcranii&l or academic problems, and not pry into others, 
and two, that atademu and emotional counseling should be 
kept separate 

Hcauions on the case study items were, as was expected, 
interesting and revealing Seventeen students, ns might be ex 
pectcd, recommended referring the boy of item 91 to n source 
of more specialized psychological care, m most cases medical 
Fourteen, however, wished the adviser to intercede directly 
with the parents to get them to mulct stand the boy better 
Other recommendations were largely partial 01 palliative;finan¬ 
cial aid, so that he could live away fiom home, assistance in 
scheduling, and the like. Foui specifically enjoined the adviser 
to fallow through on wlmt would evidently be a long and dif 
ficult case 

On item 92, psychiatric aid, recommended by 27 students, 
was virtually the only cogent suggestion to cmeigc. Three stu 
dents, however, recommended a stern attitude 

Perhaps the most strikmc characteristic of the responses on 
item 93 was their almost uniform hostility Only five students 
recommended that the adviser attempt to get the consent of 
the University to the maintenance of the existing, erroneous 
agreement ns the student wished, which was, indeed, the course 
of action successfully undertaken in a closely parallel case whic 
suggested this item. Nine students urged that the adviser po¬ 
litely hut firmly require the student to take the Natural Sci¬ 
ences program, Fourteen recommended that the adviser explain 
to the student the advantages of the Natural Sciences sequence, 
and its gieatcr consonance with the objectives of the Co ege. 
Three students recommended an aggressive firmness—one 0^ 
these stating that "a few spankings when he was younger 
might have helped the student, and another stating t at s 
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should “know when to keep his mouth shut ” As the item, as 
presented, gives no intimation of the peisonality of the stu¬ 
dent involved—-intentionally so, since this item was chosen to 
measure student leaction to a puiely administrative situation 
without clinical aspects—tins evidence suggests that many stu¬ 
dents in the College are highly identified with its objectives, 
and highly intellectual chaiactei, (7) and are inclined to ex¬ 
empt the College Plan in the absliact, though not the staff 7 or 
administiation, fiom duty as a taiget foi lebellion 

Great, indeed, is the contrast piesented by lesponses to item 
54 While eight students lecommend refemng the boy to a 
psychiatilst, and three suggest that he be lequncd to confoim, 
seven state that the progiam must be changed, because not to 
do so would infringe on the boy’s fieedom of lehgionj five, in 
this case, suggest direct appeal to the administration to insure 
that case is not handled legahsttcally One student states that 
the adviser “must do eveiy thing possible to maintain the boy's 
faith " Three suggest that the boy be refeired to an official of 
his own chuicli 

On item 95, as might be expected, most of the commentators, 
as would be expected, weie concerned about the possibility 
that the student might be coeiced, or that his privacy might 
be unduly invaded, than weic concerned about his ultimate 
fate In the main, they were not hysteiically so, and theie was 
considerable acceptance of the dangeis which such a student 
might be piling up for himself Nine students felt that inter- 
feience of any kind was unjustified, or that the behavior of the 
boy was not peculiar, Five, however, thought counseling should 
be given; thiee, that the student should be introduced aiound, 
and six, that his old inteiests should not be discouraged, but 
that he should be led to develop new ones. It should be empha¬ 
sized with refeience to this item, as with the other four in its 
group, that students did not make comments unless they wished 
to amplify 01 1 eject the five alieady available to them in the 
instrument, reference to Table 1 will show that most of the 
students weie able to accept one of the positions presented m 
the item 

What inferences do these data suggest, with reference to the 
questions raised as to the scope and responsibilities of a coF 
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lege advisory stein? Perhaps the most interesting and suggts 
live is the r.inon.il picture winch students seem to have of the 
Advivor) S)strm .mil of its limitations r Ihey tccognize that 
m a situation providing as varied services ns the University of 
Umago, its function is primarily academic Tina would not 
wciti to iml it ate, of to urse, that students rcgaid thedegreeof 
insight into the sources ol tlieir diflieulties which an adviser can 
muster as liiumpirtaiU, ratlu-r, that tliey do not expect ad¬ 
visers to ticvtlop sustained clinical relationships with, them 
Ivudemc lor this tomes hotli fmin responses to items i6-2g 
and frtmt the 'Vase study' 1 Hems 

Nevertheless, intnv students who aic well aware that cer¬ 
tain problems arc psychiatric, and that advisers ate not psy¬ 
chiatrists, still consider that the University has, a jesponsibility 
to assist them with such problems, and believe the adviser to 
he the most appropriate smiict* to which to turn for aid— 
doubtless as liaison to professional sources Note particularly 
responses to items lu ami 29 
Students lend to repaid as outside the scope of University 
service their legal pinhlems (note items iH, 24, and 27, an d 
perhaps others tn wlmli they ft cl its iole would mosL likely be 
punitive (item 23) 'I hey do not tend to repaid their emotional 
problems as, perse, outside the scope of Umveisity responsibil 
tty (Items 20, 2b, and 29), 

Students base their opinions of the Advisoty System on a 
fair amount of inftumautm 'Hid mean of 12,2 out of a pos 
siblc 2 o seems high, especially in n sample containing 54 * 94 ^ 
entrants who had been at the* University less than a quarter, 
mid who made a mean score of to 9 themselves, 

Responses seem to &uppoi c a common-sense view of the ad¬ 
visory function The unanimity of responses on the case-study 
Items seems to indicate vei y little disagreement among students 
as to what they want fioni ndvis.cn, and the comments seem 
to beat this out They want warmth, understanding and accept¬ 
ance of then goals and purposes. Where necessary, they want 
intercession on then behalf. They do not want advisers to pay 
psychoanalyst: at them, but it should be borne in mind that an 
an adviser who would do so would not be behaving at a as 
would a real psycho analyst attempting to help the same m 
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vidual The students do not, theiefoie, reject the concept of 
theiapy, but the possibility of its being used by someone else 
to act out his own problems’—a good thing for anybody to 
reject Most of them accept as desnable, accoiding to responses 
to item 95, the mteicession of the advisei on behalf of an aca¬ 
demically successful but doubled student Many see dangers 
in this, howevei, and a few react stiongly against it, 

There is a lemaiknbly consei vative, and, if one may say so, 
uncritical and middle-class oiientation of student values, re¬ 
lated to leligious and peisonal freedom Theie is stiong, and, 
again, peihaps uncntical identification with the College, its pur¬ 
poses, and what they conceive to be its moies (2) Evidently, 
students do view the system as a pait of the total educational 
service of the institution, and expect its functions to be modified 
in the light of, or perhaps even deteimined by, the institution’s 
purposes 

Internal criticism of the inferences made Is possible, though 
laboiious, by a statistical analysis of differences among the 
sub-groups on lelevant items Foi example, if one reason the 
data cited fall into the pattern observed is that students view 
the Advisory System lealistically, and do not attribute to it 
psychoanalytic functions, one would ceitamly expect of younger 
and less experienced students that they would make choices 
indicating somewhat moie dependence than the lest of the 
group An examination of item 29, response B, reveals that 
56 per cent of students in the nth and 12th glades, aged 18 
and under, legaid the adviser as the most appropriate Univer¬ 
sity official to approach with this highly clinical problem, as 
compaied to only 37 per cent of the remaining group This 
difference is significant at the 5 per cent level, yielding a criti¬ 
cal ratio of a 1 On the other hand, 22 pel cent of the younger 
group choose response D, as compaied with 31 per cent of the 
remaining gioup, which is not a significant difference, this, too, 
is perhaps explicable, since a significant difference on this re¬ 
sponse would indicate positive disillusionment with the system 
with growing independence and matuuty, which presumably 
does not occur On item 95, 10 per cent of the younger group 
choose response A, as compared with 24 per cent of the remain¬ 
ing group, a difference yielding a critical ratio of 2,3, and to 
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be expected in view of the greater independence of the older 
adolescent or young adult 1/ifty-mx per cent of the youngs 
group, as compared with 4H per cent of the remaining gro U p ) 
lumber, choose resume T), a difference in tile expecteddhec 
tmn, hut not significant and not sufficient to constitute evidence 
that the older group repudiates the assistance of the system in 
solving personal problems 

If the results obtained by applying the instrument described 
at the University of Chicago are representative, then, it seems 
that while students feel th.it they need warmth and under¬ 
standing and that the University is obligated to provide help 
with personal problems, they are not likely to misuse or over¬ 
burden the source of such help They will, m general, take as 
much as can he given of wdi.it they need. The more psychologi¬ 
cal insight which the Ad\ kers in a system possess, and the more 
dearly the system itt.lines its scope to include service with 
pmrmal problems, the mmc students will expect of it and use 
it. Some, however, will become frightened and hostile, and 
mast expect enough initiative to he left to them to permit them 
to feci respected, rather than manipulated 
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THE ROLE OF STUDENT GOVERNMENT IN THE 
STUDENT PERSONNEL PROGRAM 

BROraER LOUIS 

Deiui, St Mnry'j College, Winonn, Minnesota 

There are so many difFeient interpretations attached to the 
term, student government, that it would seem almost necessary 
to open this discussion with a definition of it, However, I am 
going to sidestep that responsibility, and hope that my defini¬ 
tion of student government will gradually be recognized from 
what I have to say about it I doubt that a concise definition 
could be given which would not be subject to various interpre¬ 
tations And,so,foi the pi esent, by way of pi eliminaiy explana¬ 
tion but not as a complete definition, I will say only that when 
using the term, student government, I have in mind a student 
organization composed of the highest elected officers of the 
student body, having very definite and real responsibilities for 
all student life and student activities on the college campus, 
and working in close conjunction with faculty, student body, 
and administration My comments will be directed towards 
three mam points (i) the place that student government should 
have in the total personnel progtam of the college, (i) the func¬ 
tions it should fulfill, and (3) the conditions that aie necessaiy 
In order that it can effectively carry out these functions 
Student government must be an essential and integral part 
of the total peisonnel program of the college because it is the 
one means for accomplishing those aims of the personnel pro- 
giam which are related to and achieved by group living and 
group activities, While other aieas of college personnel serv¬ 
ices aie concerned primarily with the student as an individual, 
the area of student government is concerned with the student 
as a social being, m relation to both the college community 
and the other social environments in which he will live It is 
the means for unifying all efforts of the college toward the 
education of the student as a social being. Since, then, it is 
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one of the jKT^nncl services, it should have the 9amc recognized 
status, the same prcuigc, and the same freedom to operate 
within its sphere of responsibility as, for example, the health 
service It should nlso, by ns vciy nature, have the same all- 
pervasivenc^ with respect to the whole college program as the 
counselmg services 

This implies that the program of student government comes 
directly within the scope of responsibility of that administrative 
officer of the college who has general charge of all student per 
sound jicrvues On most campuses this would be the Dean of 
Student* It also implies that the authority and responsibility 
which the‘undent government has arc delegated and notabso 
lute, that is, delegated by the administration to the student 
Ixxly to he exercised by the elec ted ofheers of that body in 
accordance with a constitution accepted by the student body, 
the administration, and the faculty If correctly understood, 
this places no real restriction on the student government, since, 
juflt in the same sense, the authority of the Dean of Students 
or the Dean nf the College is delete red ami not absolute The 
crux of the mutter is really the good judgment of the higher 
administrative ollucrs ol the college, who can cither mnkeor 
break the program of student government according to their 
attitude toward it. If restrictions arc imposed to such an ex' 
tent that there is no possibility that the srudent officers will 
make mistakes, then the program is doomed to failure 
Briefly, then, the student government is an integral part of 
the student personnel program of the college, and it has a dele¬ 
gated authority which comes to it through that administrative 
officer who lias been charged with the general responsibility 

over all personnel services 

In order to merit and maintain the status that it should have, 
the student government has several important functions to u - 
fill. The major ones, I would classify ns follows: 

X. It should have the responsibility fot the operation and 
control of all student organizations of the college campus. 

s, It should have the responsibility for promoting, organiz¬ 
ing, and directing wli.it might he termed al-co ege 
functions and programs, that is, those which involve e 
whole student body and not just one particular organiza¬ 
tion or group. 
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3 It should have a definite responsibility for the formation 
of policies concerning all student life and student activi¬ 
ties of the campus 

4 It should piovide the means for achieving mutual under¬ 
standing and close coopeiation between students, faculty, 
and administration 

Each of these functions needs some explanation Under the 
first of them, the lesponsibility foi student organizations, would 
come the leviewing and approving of constitutions, the setting 
up of standards, the auditing of books, the supervision of social 
and other affairs of these organizations—such as dinners or 
dances—the education of officeis of these organizations, the 
authorization of student concessions, the supervision and con¬ 
trol of student publications and student bulletin boards, the 
fostering of wide student interest and participation in the vari¬ 
ous campus organizations, and so on Much can be done by the 
student government towaid the education of officers and mem¬ 
bers of these organizations 111 then duties and responsibilities 
Sponsonng and directing leadership workshops open to all stu¬ 
dents is one, pioviding consultation services is another, and 
developing brochures giving helpful suggestions is a third Two 
such brochures which I lecently received from Washington 
State College are excellent examples of what can be done One 
is called "Mi Chairman” and explains the rules of order con¬ 
cisely, yet adequately The other is called "Officers' Blueprint” 
and has many good suggestions and lecommendations 

The second general function of student government stated 
previously is the responsibility for what we called "all-college” 
affairs and programs This would include, first of all, "all col¬ 
lege” social functions and affaus, such as dances or similar 
functions, which are common to all colleges Other types of 
programs would perhaps vary from campus to campus As 
illustrations of those for which the student government can 
assume full or partial responsibility I would suggest the follow¬ 
ing campus activities for the annual homecoming, field days, 
the relations of inter-college student associations with the stu¬ 
dent body, parents’ weekend, the oiientation of hew students, 
and convocation progt nms We can include here, also, the spon¬ 
soring of student forums and inter-college conferences, on stu¬ 
dent problems or on pertinent topics of the day, If the college 
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has n student union with n union board to direct the activities 
centered llicre, ihi*s also, I believe, should be placed under the 
general rmpcmbihdity of the student government 
The third general function of student government concerns 
the formation of policies Conditions of student life and the 
operation of student activities arc certainly of great concern to 
the Mudenr body as well as to the faculty and administration, 
and politics concerning them arc much more effective if the 
students have a voice in their formation The formation of such 
policies should he a cooperative or joint responsibility of stu¬ 
dents and faculty (The lefm "faculty" will sometimes be used 
loosely here to include both faculty and administration) Hence, 
n joint student faculty committee, meeting weekly, is a prac¬ 
tical necessity for this purpose Such a committee has been set 
up cm n number of campuses "1 lie committee should be ap¬ 
pointed by the president of the college, with the student mem¬ 
ber designated by the student government. Its puipose is to 
draw up policies gen ermng student life and student activities 
at the college It should have the same iccogmved status as do 
all of the other committees appointed by the ptesident The 
student government should have the responsibility not only of 
designating the student mcmbeis of this committee, but its 
approval, ns well as that of the faculty, would be required be¬ 
fore any of the policies pioposqd aic accepted. It can also assist 
the committee by tile recommendation of points for incorpora¬ 
tion in the policies to be proposed. 

Considering, now, the last of the stated functions of student 
government, it is obvious that a college educational piogram 
can operate effectively only in an atmosphere of mutual respect 
and understanding and cooperation between the three major 
groups which compose the college comm unity, students, facu ty, 
and administration. There must be an oppoi tunity for free dis¬ 
cussion and interchange of ideas between all three, The student 
government furnishes an effective instrument foi achieving this 
desired result, if channels arc provided for direct approach to 

each of these major groups ., 

For contact with the student body, a necessary means wou 
be a student convocation, monthly or oftener, conducted en¬ 
tirely by the student government, Contact should also be main- 
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tamed through the medium of student publications Other 
means would be the holding of meetings open to the student 
body, and having an office and definite office hours when stu¬ 
dents can come to present and discuss problems, questions, and 
suggestions 

For contact with the faculty and administration, there is, 
first of all, the faculty 01 administrative adviser and the stu¬ 
dent-faculty policy committee mentioned earlier Other means 
will vary according to local conditions. On oui own campus 
seveial changes have been made as our piogram progressed 
Oiigmally we had regulai business meetings of a college coun¬ 
cil, composed of the student government and a committee from 
the faculty and admuusti ation, of those dnectly concerned with 
student problems While this body had other functions, our 
main idea was to educate the student and faculty groups to an 
undeistanding of and respect for the viewpoints of the other 
However, so much has been done to modify viewpoints and 
attitudes of both students and faculty as to make the lengthy 
discussions we used to have, unnecessary Questions or prob¬ 
lems which mise now, are taken up diiectly with the proper 
administiative officer concerned or with the student-faculty 
policy committee, and settled promptly and satisfactorily Since 
business meetings of this college council are no longer neces- 
saiy, we are changing this year to informal luncheon meetings 
in order that the two groups will still keep in close touch with 
each other and so safeguard the harmony in attitudes and 
thinking 

Oiigmally, too, we held separate meetings also of the faculty 
committee which was part of the college council This would 
be pool piactice if the intention had been to form a united 
front to present before the student officers Our intention, 
really, was to modify the somewhat extreme viewpoints of one 
or two of our membeis, and this enabled the Joint meeting with 
the student group to pioceed more smoothly as a result We 
have since discontinued this sepal ate meeting of the faculty 
committee. 

Another means which makes for good relations is for the 
faculty and administration to consult with the student govern' 
ment even on those problems and policies over winch they 
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retain final decision If changes in policy are explained before- 
hand, together with the reasons which influenced the decision 
much better understanding and cooperation on the part of the 
students can lie achieved. 

A program of responsible student government often requires 
much patience in the beginning At first there is quite likely to 
be a distrust and sparring fm advantage which delays progress 
There is also, at first, n tendency on the part of students to be 
preoccupied with very petty problems and to ignoie the realty 
important ones. This is not ihcir fault since they have not had 
nny previous education in this respect llowevei, with tact and 
patience this difficulty can be overcome, and the final result is 
worth the effort. 

It may seem that I have completely ignored the area of 
student conduct and student discipline in ielation to student 
government. This is not my intention since I regard it as being 
included under each of the four major functions discussed. 
Through Lhe student-faculty policy committee, for example, 
the student government has a very definite voice m formulating 
policies regulating student life and student conduct, And the 
responsibility for activities of student organizations or for the 
whole student body entails a responsibility for student conduct 
in connection with such activities. Further, it is my conviction 
that the student government can also assume the responsibility 
for supervision of student conduct in residence balls I have 
not placed this as one of the major functions of student govern¬ 
ment because it seems to me that too great a stress on the area 
of student discipline implies a negative rather than a positive 
approach, and can lead to a neglect of other important areas 
It also surprised me at first that for the most part, the men 
students, at least, do not care to assume responsibility for the 
supervision of student conduct in residence halls unless there 
is considerable dissatisfaction with the way this supervision is 
being handled. At n conference on student government which 
our students sponsored earlier this month, and which was at¬ 
tended by delegates fiom about twenty-five colleges, this ques¬ 
tion was discussed at some length. According to the report I 
received, only one of the men delegates was strongly In favor of 
having the student government assume responsibility for the 
supervision of student conduct in residence halls All were very 
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anxious to have a voice m determining the policies, but, in 
general, did not want to go beyond this 

As a final point, I would like to comment on some conditions 
which are necessaiy for the effective functioning of any pio- 
gram of student government. I will pass over the necessity of 
having the authority ancl responsibility of that body clearly 
defined, as being too obvious to need comment Outside of that, 
the most essential condition is that thete be good relations 
between the student body and the faculty and administration 
If the faculty lacks confidence in the student group and its 
representatives, if it is unwilling to take time to discuss fully 
with them questions and piobjcms of mutual concern, thus 
ignoung the educational possibilities this affords, then the pro- 
giam is foredoomed to failure The lesult will be that in the 
students J minds the college community will be composed of 
two lival factions, students versus faculty, each struggling 
against the other The admmistution must take all possible 
means to prevent such a situation Some means which can be 
used have been indicated earliei, but even these will fail unless 
the viewpoint of the faculty and administration is one of re¬ 
spect for and confidence in the student group. 

A second necessai y condition for an effective student govern¬ 
ment piogram is piovision for insuring continuity of policy and 
the education of student officers If each newly elected group 
must start horn scratch, there will be no appreciable growth, 
One practice which is good and is also quite common, is to 
have the new members elected early enough so that they can 
sit in at all of the meetings of the present members until the 
time comes for them to take office. Another means which our 
own student government uses seems to me to be even more 
fruitful In late Spring they take the officers-elect away to a 
camp foi a two-day orientation Several members of the ad¬ 
ministration and faculty are invited also The time is spent in. 
discussing with the newly elected officers, die problems which 
came before the student government during the past year, the 
solutions ai rived at, the policies established, and the projects 
and plans far the future. The specific purpose is the education 
of the new officers in their duties and responsibilities. It really 
works 

A comment might be made also on the size of the 3tudent 
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government body This also is a contributing factor to effective, 
ness If the group is too small it becomes ineffective, oris in 
danger of hemming the tool of pressure groups, If it ] 8 1 00 
large it becomes unwicldly, and tends to lose the "esprit de 
corps" which should chaiactcrizc it 
This brief discussion of student government does not pretend 
to exhaust the subject, I have attempted simply to explain 
those points which luve impressed me most strongly when 
working with students Many others could undoubtedly be 
added The one general impression I luve from my own experi¬ 
ence with students and their officers is their willingness and 
their ability to show a strong sense of responsibility, to be 
mature in their judgments and to undeistand and discuss in- 
tclligemly the problems involved in dealing with people Capi¬ 
talizing on these student traits goes a long way toward achiev¬ 
ing our educational objectives and toward developing the 
educated leaders we want our college graduates to be, 



STUDENT PERSONNEL WORK AND THE 
NATIONAL STUDENT ASSOCIATION 

GORDON KLOPF 

Chairman, Nniionnl Advisory Courted, N S A, University of Wisconsin 

I bring you gieetmgs fiom the National Staff and the Na¬ 
tional Advisoiy Council of the United States National Stu¬ 
dent Association It is giatifying to both the Council and the 
Staff that the American College Personnel Association has al¬ 
ways had a place for discussion of the National Student Asso¬ 
ciation on its convention piogiam This js peihaps lightly so— 
foi who should be moic concerned about a piogiam affecting 
students in over thiee hundred colleges than the peisonnel 
workers in those colleges? Dcfoie we exploic the 1 ole of the 
college peisound progiam and staff persons in iclationship to 
the National Student Association, let us observe what NSA is 
doing and what its futuie plans are. 

In studying the objectives and piogiams of NSA, we find a 
great emphasis given to the importance of tianting students 
for citizenship To seivc this end, NSA is urging the develop¬ 
ment of the campus as a community—a community of students, 
faculty, administrative, clerical and service staff, as well as 
regents, tiustees and alumni. To make a community philosophy 
function, students must be represented on majoi committees 
nnd boFtids—partjculai ly those which affect student life If the 
campus or educational community is to be an educational ex¬ 
pel lence and a training for citizenship, it must be more real 
than something we put on like “Sunday go-to-mecting clothes,” 
In all phases of campus life, an oppoitunity must be provided 
for democratic piocesses to function, 

Few institutions have given students an opportunity to play 
a part in the academic planning progiam If we are to give the 
student a realistic expeilence in demociatic community plan¬ 
ning, it is essential that we break down some of the distinction 
between the student and the teacher As Harold Taylor, Piesi- 
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dent of Sarah 1 .iwrcnce College, says, “Education is not some 
thing done 10 students, it is something students and teachers 
do together ” Educational planning lias been chiefly the impos¬ 
ing of the academician's point of view upon the student; NSA. 
urges greater consideration of the student’s point of view, Presi¬ 
dent Hlandmg of Yassar says, “Student opinion concerning 
matters winch arc considered to be the chief responsibility of 
the administration and faculty is c\Licmely impoi tnnt, particu¬ 
larly if presented in a thoughtful, constructive and responsible 
manner/' In California we find President White of Mills Col¬ 
lege “deploring the lack of faculty inspect for student opinion " 
He thinks NSA should accept the challenge to do something 
to generate a greater respect on the part of faculty and adminis¬ 
trative personnel for student opinion 
NSA has developed, as many of you know, a program of 
student-faculty evaluation *1 he lint edition of the program 
describing student-faculty evaluation sold out shortly after it 
was issued Copies of the second edition are still being ordered 
in large quantises This publication contains basic principles, 
forms, and procedures which can easily be adapted to the local 
institution. Students aic deeply concerned about improving 
instruction—and who knows more about the instruction they 
arc receiving than they do? 

Among the pioncci programs in student-faculty evaluation 
were those at the University of Michigan, University of Cali¬ 
fornia and the University of Wisconsin Recently, one of the 
depat tmeuts at Wisconsin had an assembly with both faculty 
and students in attendance to evaluate the role each played in 
the instructional work of the depaitment These and similar 
programs have been motivated by the National Student Asso¬ 
ciations^ work m this area 

An issue which concerns many college administrators is that 
of academic freedom. At the 1949 Congress, the Association 
resolved, “That membership in any political, religious, or other 
organwation, ot adherence to any philosophical, political, or 
religious belief does not constitute in itself sufficient grounds 
for the dismissal of faculty, failure to rehue, or denial of tenure 
to educators of the United States ” 

In exploring the rale of the student In the government of his 
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community, the NSA has given impetus to a tiemendous 
Intelest in student government A numbei of excellent publica¬ 
tions in the foim of booklets and mimeographed piogi am mate¬ 
rials have been published by the Association NSA has stimu¬ 
lated the development of student leadership conferences, stu¬ 
dent government clinics, and workshops on the local, legional 
and national level dealing with the role of the student in the 
governing of higher education, 

In ptomotmg the concept of college 01 university as an edu¬ 
cational community, the NSA realizes that most aspects of a 
community must be governed by trusLees, regents, deans, fac¬ 
ulty and administiatois It is urging, however, that student 
opinion and repiesentation be included to a gieatci degiee on 
committees, councils and bonids, giving students the oppoi- 
tumty of expiessmg then point of view and of having the ex- 
peiience of woiking with the staff membeis of the educational 
community. If we accept the lesponsibility of higher education 
as being a training giouncl foi citizenship, we need to think of 
the institution as a community-stiuctured unit with students as 
well as staff membeis as citizens, participating m the planning 
and governing of the community It is to this end that NSA 
is working. 

The National Student Association is also interested in devel¬ 
oping “concerned citizens” among our students. To implement 
this objective it has planned an extensive international pro¬ 
gram Almost eight hundred students will participate m the 
tours abroad this year, In providing this travel program, NSA 
not only saves the American student huncheds of dollars on 
every tour, but is helping the student to get the maximum from 
his travel experience by making the tour a “study” as well as 
“sightseeing" program, The student not only sees the Eiffel 
Towei but learns about the people of France through studying 
the French language, the history of the French people, and 
their customs, while on board the sliip taking him to Euiope. 
In France he meets with student groups and with people of 
France othei than the “Cook's Tour Guide" type of individual. 
When the student returns to his campus, NSA has urged him 
to try to give other students some means of benefiting from 
his experiences abroad NSA has also developed a program of 
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work camps ami has done a great deal to bi ing displaced per¬ 
sons to the American campus It is constantly working with 
other .urcnucs m the international field Shoitly, you will see 
on jour titsk a copy of )'onfh and UNESCO, a new publica¬ 
tion whit It has been published by NbA and UNESCO NSA 
has been a trial part of the piogrnm of the World Student 
Service Puml Many of vou may have hem cl about the expanded 
role of World Student Service Kuiul in the piogiam of inter¬ 
national edmatmn NS \ has been consulted on this progiam, 
and, as it takes shape, NSA Icadeis will be involved m its 
implementation, 

The National Student Association lias also urged colleges to 
permit political activities on campuses, including the permis¬ 
sion An speakers of all political views to appear, and the devel¬ 
opment ol [Kilitical organizations on the campus I think it 
might he said that the NSA agrees with Robeit Hutchins when 
he says, 

# I lie policy of repression of ideas cannot woik and never has 
worked, the alternate to it is the long difficult road of educa¬ 
tion, to this the American people have been committed It 
requires inncnu* and tolciancc, faith m principles and prac¬ 
tices of dcnioLT.it), faith lh.il when the citizen understands 
all forms of government that ho will prefer deniociacy and 
that he will he a better cm/cn if he is convinced chan he would 
he if lie were coerced. 

The piogram of the Association is interested in developing 
a “socially concerned” student All phases of the National Stu¬ 
dent Association’s Progiam arc aimed at ptovidingexperiences 
in inter-group and inter-personal undcistanding The 1949 Con¬ 
gress certainly scivcd to illustrate the importance of students 
who represented different batkgiounds and points of view work¬ 
ing together when a sub-commission dealing with a debatable 
statement of policy refused to present a final draft until the 
students lidding an opposing point of view were consulted and 
placed on the committee. Thiough the regional, state, and 
national conferences, students of all races, religions, political 
backgrounds, geographical regions, social and economic status 
have an opportunity to work together. The national congresses 
have taken definite stands on discrimination m student groups 
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and have asked member student governments to piohibit or¬ 
ganizations which disci iminnte against groups of individuals 
To help implement the best in human lelations in the Educa¬ 
tional Community, the Association has recently published a 
booklet, Human Relations m the Educational Community Tins, 
as well as many othei piogiam matcuals issued by the Associa¬ 
tion, will give students, faculty members, and administratess 
suggestions foi meeting the challenge so ably stated by Piesi- 
dent Chaiies S Johnson of Fisk Umvcisity that 

Unless the Ameiican people solve the racial issue they 
face a national defeat fiom within thiough loss of faith in their 
very reason foi living We cannot lest now, or turn back the 
tides, or settle the ciucul issues by comfoi table compmmises. 

We can eithei be couiageously ugliteous in 0111 belief in oui- 
selveSj 01 adopt an ideology and way of life to fit our msepai- 
able sins 

The National Student Association is also concerned with the 
economic wclfaie of students As many of you know, it has 
developed a Purchase Caid Plan which has been successful in 
many educational communities The staff leahzcs that the plan 
is not woikable in eveiy community and has developed other 
means of helping students to meet then economic needs. The 
NSA is distiibuting piogiam materials concerning cooperative 
stores, housing and eating groups At the 1949 Congress, it 
appioved by an ovciwhelming majoiity the need foi fedeial 
scholarships to be awarded on the basis of need and ability and, 
recently, the national staff paitic'ipated in a confeience spon¬ 
sored by the American Council on Education m the di awing 
up of a bill foi fedeial scholarships to be piesented to Congress 
In concluding this section, which lias given you a bnef pic¬ 
ture of the progiam of the Association, I wish to say that I 
agree with President Haloid Taylor of Sarah Lawrence College 
that “a lethargy is present in the Ameiican student body which 
has resulted fiom the fact that our college and university ad¬ 
ministrators and faculty have not given sufficient encourage¬ 
ment and opportunity to the participation of the student in 
the total life of the campus,” The National Student Associa¬ 
tion is three yeais old, and I am suie you will agiee with me 
that it has done much to encourage student participation in 
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the total life of the campus, Part of the success of the Associa¬ 
tion, however, is in your hands 
It is important this morning that we also examine the struc¬ 
ture .util administrative procedures of the Association The 
most frequent question asked ol the Advisory Council, now 
that administrators arc assured the Association is not overrun 
with "fellow travelers,” concerns the matter of cost Many of 
us will admit the cost has been high and have urged the Asso¬ 
ciation to study the possibilities of ieducing membership fees, 
Since the membership has increased, dues have been reduced, 
and arc going to be reduced to an even greater degree However, 
it is important that we compare the cost of membership in the 
National Student Association and the cost of student govern¬ 
ment to other activities cm campus, Theie is haidly a college 
that does not spend more on debate and forensics, with rela¬ 
tively few students participating, than they do on student gov¬ 
ernment or membership in the National Student Association 
The experience of a student who attends a Confidence or the 
Annual Congress is just as important to that student as his 
participating in a debate tournament or a regional forensic 
contest Are American institutions as willing to spend money 
to educate for citizenship as they are willing to spend money 
to buy band uniforms, train baton twnlers, debaters and 
athletes? 

I believe that, basically, the problem with the National Stu¬ 
dent Association is not the three cents it costs each student on 
the campus to belong, but rather lies within its structuie The 
organization nationally consists of the Student Congress which 
meets annually, the National Executive Committee, composed 
of Regional Representatives, which meets between Congiesses, 
the Staff Committee, and Regional Organizations The weak¬ 
ness in its structure lies in the Regional oiganization and in 
the local campus channeling. On your local campuses, the per¬ 
son who should be most concerned with the program of the 
Association is your student government president, Because of 
the complexity of his job, he may have assigned the channeling 
and coordination of the NS A materials to a special committee, 
commission or coordinator However, my experience with local 
Campus structures indicates that the closer the president is to 
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the NSA program, the moie he reads and channels program 
materials to proper committees, the bettei the purposes of the 
National Association are being served Let us take, for example, 
the recent matenal that Ted Perry, the Vice President in chaige 
of Student Life, has sent to the Student Government President 
concerning campus social and lccreational piogiamming Ted 
has developed an excellent collection of materials concerning 
both formal and informal campus lecicational proginms When 
the student government president receives this, he should im¬ 
mediately foiward it to the Campus Social Committee, the 
Union Dance Committee, the Dormitory Social Committee, or 
whatevei Committee 01 Board is concerned with planning cam¬ 
pus social activities He might also refer it to the Dean of 
Women or Men, the Student Activities Dnectoi or the Doimi- 
tory Social Director, I cannot urge you as personnel people too 
strongly to be sine the material that the National Student 
Association distributes is lead by the people who should be 
concerned with the paiticular project. The campus that per¬ 
mits these excellent suggestions to he on the student govern¬ 
ment piesident’s desk is certainly not getting its money’s worth 
from membership in NSA. Again, I say it is not the cost factor 
of the Association itself—it is the inefficiency of our own stu¬ 
dent leadei9 in channeling the NSA program material 
Another factor is that of leadership m the Association It has 
frequently been said that “students will be students” and can¬ 
not accept the responsibility of administering a national organ¬ 
ization of the scope of the National Student Association, I 
think it is important that we become convinced, along with 
Dean De Vane of Yale University, that it will not do to under¬ 
estimate the abilities of our young people, and that, if the or¬ 
ganization has not enough in itself to assure its continuation, 
it ought to die I think we also agree with the college president 
who said that, “We do not want to see an aging secretauat 
grow up in NSA.” However, I think we have to lealrze that 
the mature leadership of the post-war veteran student body is 
no longer present. Wc all realize that the American student 
body is not as mature aa the student bodies of two years past. 
Our job as personnel workers is to be sure that we encourage our 
local NSA programs and write to the national officers to give 
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advice mid suggestions. Administratively, the Association is 
meeting the problem of continuity of leadership through hav¬ 
ing several of its officers run from February to February and 
others from September to September If ever there was 1 need 
for a National Student Association, to develop concerned citi¬ 
zens, it is at the present time 
last of nil, I would like to refei specifically to the role of the 
National Student Association and the personnel worker The 
Association is established to achieve many of the same objec¬ 
tives in which we, ns counselors of students, are inteiested If 
all its printed materials, its hundreds of answers to individual 
letters on local problems, and its regional and national confer¬ 
ences and confesses are fully utilized by your campus, the 
Association can help you achieve the objectives of your per¬ 
sonnel program. The local, regional and national leadeisup, 
however, needs your help. It's up to you to airy out the lines 
of the song, "Accentuate the Positive and Eliminate the Nega¬ 
tive," 1 have attempted, today, to mention just a few of the 
positive contributions and significant objectives of the National 
Student Association Again, I say it is our job, as personnel 
workers, to be informed about them and to help the student to 

accentuate them 



CONTRIBUTIONS OF THE STUDENT UNION TO THE 
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Among the many personnel services that characterize the 
contempoiaiy Ameucnn college, the student union is a com¬ 
parative newcomer, Student union goals may be expiessed 
simply' 

i To help piovide a recreational program for the student 
body 

% To ieduce the cost of going to college by supplying inex¬ 
pensive recreation 

3 To furthei fellowship and understanding by providing an 
opportunity foi students of different races and social back¬ 
grounds to meet on an equal footing, 

4 To promote Lhe peisonal development of the students by 
bringing to the union the best in the arts and by giving 
the student an opportunity to participate in giacious so¬ 
cial gatherings 

5 To provide a situation where students participate in self 
government and learn to cooperate with others and to 
take responsibility 

6 To unify the campus, large or small. 

The bronze plaque at the front entrance of our own Bowdoin 
Union says, "Here the fires of friendship are to be kindled and 
kept burning, 

What about the administration of these organizations called 
student unions? The most successful unions in this country are 
located on coeducational campuses and are housed in coeduca¬ 
tional buildings We have seen the uttei folly, even within the 
last decade, of building separate plants for men and women at 
opposite ends of the same campus, It sounds amusing to us 
today, but it is also tragic 


S3j 
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Now the driving force within any student union is the di¬ 
rector Wlnle about 85 per cent of our union directors are men 
some of die directors of large successful coeducational unions 
are women 1 know a nunibei of coeducational unions with ex¬ 
cellent women directors, 1 shall not go into the merits of tins 
situation, but I shall take advantage of my position here and 
refer to the director as he The union director largely determines 
the union goals because he is on the job day after day and be¬ 
cause he is the manager of the building, lie should be in charge 
of nil personnel, directly or indirectly, within tile union, Other¬ 
wise, many times his hands arc tied It is difficult for some 
college presidents and business managers to see this point. 

The union director should be advised and assisted in policy 
making by a faculty-student board, Faculty and staff members 
indispensable on such boards include deans of students, student 
counselors, directors of student activities, teachers of psychol¬ 
ogy and allied personnel workers. Here is the great chance for 
personnel officers to make their influence felt On the other 
hand, in a smaller institution like my own, the union director 
also serves cm various boards for the deans’ offices This is a 
dealt able interlocking arrangement, 

Student members arc also indispensable on policy-making 
union boards, Here, «ts nlmost nowhere else, the undergraduate 
tries his wings in student government and organization, He has 
a building, it workshop, a program to direct, Here is democ¬ 
racy really at woik. 

The program of the National Association of College Unions 
for nearly ten years lias contained papers describing the job of 
the union director and his responsibilities for coordinating his 
program with that of other peisound offices, If he js not doing 
so, it may be because he has been charged by the univeisity 
with the task of making a multi-million dollar building pay for 
itself and that he has little time for anything else, I am sorry 
to say that I think there will be more rather than less tendency 
in the future for university officials to put pressure on the fi¬ 
nancial rather than the personnel considerations in the direc¬ 
tion of college unions. 

During the next few years enrollments will decrease and stu¬ 
dent personnel staffs, including student union 9taffs, will likely 
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l)e reduced The directors of the many new union buildings 
that have been built lecently, or those in the process of com¬ 
pletion, me likely to facevexing financial problems, Every effoi t 
must be made, therefoie, to avoid overlapping 111 services and 
to cooidmate the union piograms with the laiger university or 
college peisonnel programs Listed below aie some important 
steps that might be taken 

i, Centializcd rccoidmg of the social and reeleational inter¬ 
ests of students' might ovci come the expense 0/ duplicate 
records 

0, The student union organization might contribute mole 
effectively to the fieshman orientation program of the 
institution 

3 The cieative arts progiam of the union might become an 
important laboiatoiy foi academic instiuction in these 
areas as well as a setting where students may acquitc 
reel eational skills or vocational tiyout experiences 

4 The student union oigamzation must go far beyond the 
building itself. Returning vetemns who have experienced 
the possibilities of successful student union oigamzations 
on vanous campuses have often established on their own 
campuses effective programs without buildings or with 
very inadequate facilities 

For those of you who wish to pursue the whole subject of 
the student union more fully, I suggest that you consult CW- 
Iqs Umon^A Handbook on Campus Community Centers , by 
Edith 0 Humphieys This is the most exhaustive study of 
student unions made in America If you are planning a new 
union building the National Association of College Unions 
stands ready to help you, Inquiries should be addressed to 
Edgar A Wlutlng, National Sccretaiy at Cornell University, 
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Whai are the major issues anti trends in the ginduate train¬ 
ing of college pciaonncl woikm? Wc will not attempt in this 
brief paper to present a definitive answei to tins question Our 
purpose is to try to stimulate discussion thiough a rather ar¬ 
bitrary selection of issues and ti ends, 

A first step in considering tins problem of training is to an¬ 
swer the question, "Who are personnel workers?' 1 We must 
have cleatly m mind who we .wc training before we can talk 
about wliat kind of training they should have, Dining the past 
fifteen years wc have lud quite a fluwcimg of books, articles, 
speeches and committee reports identifying student personnel 
functions, services and workers. Despite considerable vaination 
andj at times, conflicts in our literatme and in our piactice, 
wc now seem to have a faiily common understanding of the 
general scope and (unctions of pciscmnel woikcrs We generally 
agree that instruction, business management, public relations 
and maintenance aic not personnel work, But when we get to 
specifics, we find the first issue to mise The following Com¬ 
mittee publications will illustiatc the point at hand 
In 1937 the American Council on Education brochure, en¬ 
titled TVitf Student Personnel Point 0/ Fiew } included health as 
one of 23 student personnel functions In the 1949 tevision of 
this brochure, health functions were again included among the 
17 basic elements of a student personnel progtnm 
Also, in the 1948 report of the ACPA Committee on Pro¬ 
fessional Standards and Training, it was recognized that health 
services were one of the student personnel functions, Yet the 
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Committee sidestepped the issue of the training involved by 
commenting as follows 

Two types of personnel services are included in the list of 
functions with winch we stmted, hut not in the special train¬ 
ing recommendations The liist of these consists of positions 
for winch recognized sund.uds me .iliendy set by some ac¬ 
crediting agency Physicians and nuiscs in the health seivice 
would Fall into this category , , it would seem to be ad¬ 
visable to let the peisons in the administrative position under 
whom these activities fall set up standaids for them winch 
are in accoidance with the goals of the piogram of the par¬ 
ticular institution 

The issue, then, is whethei or not the occupational group of 
personnel woikeis include all those who peifoim pei sonnet 
functions Are nuises seiving college students peisonnel work¬ 
ers? If they aie, should they be trained as peisonnel woikers? 
Or is then pnmaiy allegiance to the occupation of nursing? 
We have, on the college scene, many persons whose primary 
techniques are denved fiom othei piofessions, such as nurses, 
social woikeis, speech conectiomsts, physicians and clinical 
psychologists I11 any list of peisonnel functions, the seivices 
rendeied by these individuals .ue usually included Is person¬ 
nel work leally as inclusive as the 23 functions listed by the 
ACE would lead us to believe? Or, is personnel woik a small 
nucleus Jendeiing a unique seivice which is concerned piimanly 
with the individualization of education by means othei than 
instinotion, maintenance and administration? 

This issue leads us clearly to the next, concerning the pio- 
fessional status of peisonnel woikers Is oui occupational gioup 
really a profession? Dailey and Wienn caiefully considered 
this pioblem and pioposed eight catena, against which the 
occupational gioup could measuie its degree of occupational 
professionalization They concluded that, as a whole, student 
peisonnel woik falls shoit of professional status, by all of their 
criteria save one, namely, we do have a body of specialized 
knowledge and skills, Of course, the question of whether or 
not we aie a piofession is largely academic Foi the purpose 
of this discussion, it is important to recognize that as an oc¬ 
cupational group, although somewhat ill-defined, we do seem 
to have a set of unique skills and a body of knowledge, This, 
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we believe, is one of the most unpoitant reasons that we as 
personnel workers* have for considering today the training Q f 
personnel workers If we did not have something unique, then 
we could leave our training problems up to othei disciplines 
When we needed workers we would then jecruit from other 
disciplines 

A thoughtful discussion of the uniqueness of our training is 
found in the AC PA report just refmrcd to. This report stressed 
time 


Since nil personnel woiktri have as then central aims the 
welfare of the individual student, and his adjustment to the 
college situation, both in and out of the classrooms, it has 
seemed to us that training for ah should be built around a 
common core This should involve information with regard to 
individuals as individuals and as members of groups. It should 
also include the development of skill in identifying individual 
needs and problems, and handling interviews and gioup leader¬ 
ship situations constructively 


The common core was then outlined in tcims of course work, 
along with a general recommendation about the need to include 
supervised experiences, In addition, the icport spelled out five 
rather specific groupings of personnel occupations, and indi¬ 
cated the desirable twining recommendations for each group 
So much for this foi ward-looking icport We need now, for 
the purposes of this paper, to make a lough and arbitrary clns 
sificntion cif the majority of tiammg piogiams available today 
We shall admit readily that a particular program may not 
fit perfectly into one of these categories. But the classification 
does serve to highlight certain issues. I'irst, there is the if- 
some-is-good, more-oughl-to-be-bctier type of program ” This 
training has its primary oiientation in counseling, an applied 
branch of the science of psychology In this program, levels of 
personnel workets arc recognized Those with bachelor’s de 
grecs in psychology .vie consideied capable of working as place¬ 
ment mterviewm m the College Employment Office. They can 
be resident dormitory counselors while they work on their 
M.A.’s, or they may be preliminary interviewers in a Coun¬ 
seling Bureau. At the next level, the M.A.’s can work m the 
Dean of Students’ office, handling simple discipline, loan funds, 
or they can be counselors in the Veterans Counseling Bureau, 
By taking more training in psychology, these persons may earn 
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a Ph D They aie then eligible to move up to the top level 
Heie they can become the Dean of Students, or if they are so 
inclined, can get into the teaching end and train moie peison- 
nel workers This piogiam is chaiacteii2cd by adding more and 
more training in counseling, upon the .ippaient assumption 
that the higher the counseling skill the better the personnel 
work 

Now theie is a second type of training program, namely, 
''tO'eachdiis-own-specialty 1 ' type These programs recognize 
areas of specialization By enrolling in this type of tiaming pro¬ 
gram, students can be trained as vocational counselors Their 
training may duplicate, in part, that of a peisonal counselor, but 
the training piogiam will tend to accentuate differences, spe¬ 
cial skills, lather than skills common to all personnel workers 
Earliei in this discussion wc pointed out a peitinent example 
of this "aiea of specialization” approach in which certain spe¬ 
cial gioups foi which standards weie set by some other pro¬ 
fession, were accepted as personnel workers It is our belief 
that this type of piogram is based upon the assumption that 
theie is not a body of specialized knowledge and skills in per¬ 
sonnel work. Rather, the personnel functions aie a conglomei- 
ate of occupations, under a single banner, and not a single 
occupation with a variety of specialties If we carry this belief 
to its logical conclusion, each of us as personnel workers would 
have our piimarv home in some other professional land. In 
fact, we would think of ourselves as psychologists, or dieti¬ 
tians, or vocational counselors, who just happen to be working 
in a college 

In some quarters there js strenuous opposition to this second 
point of view And this opposition has been the motive power 
behind the establishment of a third type of tiaming progiam, 
namely, '’be-a-genei alist, be-an-educator” type- The training 
program is quite logically designed to provide a broad basis 
in education 

The students get this in courses in the principles, history 
and philosophy of higher education, and in methods of educa¬ 
tional supervision and administration Primalily, this point of 
view stresses the setting in which college personnel workers find 
themselves. 

This point of view, while it recognizes the importance of the 
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s:ttsm\ foils to recognize the unique body of knowledge and 
skills which personnel workers should possess 
'Ihese somewhat facetious and critical descriptions of train¬ 
ing programs highlight three of the important elements whicli 
we believe should be characteristic of every training program, 
College personnel tram mg programs should be designed to pro¬ 
vide. for levels of specialization, mens of specialization, and the 
selling in which the students will woik We tccognize that real 
problems will arise in organizing a program which takes cog. 
m/ancc of all three dements, These problems can be pointed 
up by considering two types of woikers which are now orcli 
n inly found m college personnel programs, namely, the coun¬ 
selor and the college nurse At the piesent, the college nurse 
is trained under medical auspices Her status is accepted by 
the mcclic.il piofcssion, and to H she owes her primary alle¬ 
giance She 19 truly trained to scive in her area of specializa¬ 
tion However, she receives little, if any, ti.lining in the spe¬ 
cialized knowledge and skills of pci sound work, Likewise, her 
training for servuc in the educational setting is neglected, Un 
der the progum proposed, this nurse would receive her training 
to the full competency whidi she now has in her speciality, but, 
in addition, she should receive the common core of training 
which was specified in the (94H ACPA Committee Report, pre¬ 
viously referred to She should be tinined so that she under¬ 
stands the goals and objectives of educational institutions and 
of the personnel praguin in them, and of hei 1 ole in that pro¬ 
gram With such training, this nurse would not see students as 
a parade of physical dt&oiders, of stomach damps, and head¬ 
aches, but rather she would keep in mind athci possible aspects 
of the student's adjustment, The student whose stomach is 
upset because of fear of failuic would not only get bicarb, but 
lie would be refeired to the Counseling Bureau. 

What about the counseloi? If he wete trained in a piogrnm 
which fully recognized the necessity of these three elements, 
lie, too, would be a more efficient pci sound worker, Instead of 
striving for staLus as a sort of junior psychiatrist, he would 
cleaily lecognize the uniqueness of counseling as x service m 
the educational setting. He would recognize its contnbution 
and its place in the development of the total educational pro* 
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gram He would recognize the levels of counseling skill that are 
possessed by his personnel work colleagues and by his fellow 
educators And, by the very recognition of levels of counseling 
skills, he would build lespect for himself among other staff 
members on the campus If, for example, he really believes 
that college faculty members have a 1 ole in the counseling 
process, then he would make intelligent use of such counseling 
skills as they might possess If he really believed that all of 
the staff membeis of the institution had a common goal and 
pm pose, that of enabling the individual student to achieve 
maximum learning fiom his total college experience, then he 
could join with them under the banner of personnel work 

These two illustrations have been cited to point up some of 
the difficulties that are involved in organizing a training pro¬ 
gram foi college peisonnel workers Recognizing levels of spe¬ 
cialization and areas of specialization, and the nature of the 
educational setting, will do much to produce an adequate train¬ 
ing progiam The issues, then, can be stated simply as Ate all 
peisons who peifoim personnel functions to receive at least a 
minimum of tiaining in peisonnel work? If they are, how shall 
that training be organized so that each may attam competence 
In his specialty? And how can that training be organized to 
piovide for woikers at various levels of competency? Finally, 
how can the training be planned so that students become fa¬ 
miliar with the setting in which they shall work? 

It is easy to raise issues when you are not charged with the 
responsibility of providing the answers We find it more diffi¬ 
cult to fulfill the second part of our assignment today, that oT 
identifying tiends in the training of college personnel workers 
The spotting of trends is frequently a combination of limited 
obseivation and pious, hopeful thinking, a mixture which is 
not always known to the mixer, Therefore, the following ob¬ 
servations are offered without full knowledge of the ingredients 
involved but with the hope that they may furnish food for the 
discussion pel lod 

We believe that one trend is an Increasing emphasis upon 
practical supei vised experiences, particularly in the training of 
counselois These experiences appear to be limited to one or 
two types within the collegiate institution, A few personnel 
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workers are urging that trainees be given a wide variety of ex¬ 
periences before being permitted to follow a specialization 
Williamson, for example, has urged that counselors be given 
interviewing experiences m community agencies, business per¬ 
sonnel officer, mental institutions, reading clinics, vocational 
guidance clinics, psychotherapy climes, and in elementary and 
secondary schools as well as in the collegiate setting The in. 
wrmliip oqKinence, he believes, should be integrated with the 
entire period of academic training, and not just tacked on at 
the end of the formal course-work A few institutions alieady 
are moving m that direction. 

Another promising trend nppears to be an increasing recog¬ 
nition of the need to analyze training content in terms of actual 
job function, thus lessening the disparity between one's train¬ 
ing and what one actually clues on the job. The USES study 
of educational personnel jobs which was reported by the CGPA. 
Study Commission at this convention should provide a base from 
which to do more intensive job analysis work The proposed 
pilot study of CGPA which was also reported on Tuesday 
may provide us with techniques and tools by which we can 
validate training programs against job success criteria, 

Recognizing n third trend, some training programs are now 
providing opportunities for individuals to evaluate and im¬ 
prove their own human relations skills while in tiainmg In 
abort, they nrc being provided with peisonal counseling ex¬ 
perience, and with group therapy. Please note that we are not 
advocating that all future admissions officers, counselors and 
deans be psychoanalyzed while in training. We are simply say- 
in that while they arc learning the skills and techniques of 
personnel work they arc also learning to handle their own prob¬ 
lems so they do not interfere with the application of those skills 
and techniques, 

A fourth trend appears to be the development of in-service 
training ns n function of student personnel administration, and 
the recognition of the advantages of coordinating this with the 
graduate training programs. We have customarily thought of 
In-service training as a program for graduate students doing 
part-time counseling in the dormitories, or for members of the 
teaching faculty who have agreed to work with students be- 
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yond the boundaries of traditional academic advising Yet all 
of us could piofit fiom a well-conceived, long range in-service 
tiaining piogiam to help us improve existing skills as well as 
to develop new ones on the job Here too, the training must 
recognize and piovide foi levels of specialization, areas of spe¬ 
cialization and the settings in which the jobs are being carried 
out The knowledge to be gamed fiom coordination between 
the full-time m-seivice and the graduate phases of training 
should be of mcieasing assistance in narrowing the gaps be¬ 
tween tiaining content and job requuements. 

Finally, a fifth tiend seems to be increased emphasis upon 
the philosophy of personnel work The publication of the Joint 
Committee on Counselor Pieparatwn , in which ACPA partici¬ 
pated, recommended tiaining in the philosophy which under- 
girds peisonnel work. It is dear that a training program needs 
a sound and caiefully defined philosophical base We believe 
that theie aie only a few persons who hold to the mechanistic 
bag-of-tricks approach to peisonnel woik Personnel work can 
never succeed if its practitioners build their strength upon tech¬ 
nical knowledge to the exclusion of basic human values 
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CoiLrnt pmrmncl workers this year have a particularly 
challenging task. aiming the largest gi.uluating class in the 
Nation's Imioiy to take then place in the national economy, 
About a half milium people will receive bachelor’s mid higher 
degrees this year, considerably more than last yeai's record 
total of.^],rc-o fl lie 1948 49 total was neatly one-third higher 
than the I947 4H graduation figure and nearly double the pre¬ 
war peak readied m 19J9 40,) 

These large graduating classes, of course, result from the 
post-uar boom in college enrollments, stimulated by the G L 
training program Fnrollmcnis readied n peak of 2,456,000 in 
tile fall of 1949, one million luglici (h.m the pic-wui record 
1 he number of students cmollcd and the number who get 
bachelor^ degrees will probably drop fur several years after 
I 9 S®. as the veterans move out of college into the laboi mmket, 
Howevci, the number of master's degrees and doctor’s degrees 
granted should continue to increase foi a few moie ycais And 
the drop in college enrollments will be only temporal y By the 
late 1950*3, cm oilmen's will begin to rise again, as the first 
"war babies 11 reach college age '1 he long-11111 trend foi a larger 
and larger proportion of young people to continue their educa¬ 
tion beyond high school will also tend to push enrollments up, 
The great majority of young people leaving college in the near 
future, like most graduates of previous yems, will seek jobs in 
professional, acmiprofessional, and administrative fields, In 
1950—probably also in 1951 and 1951 -many will be unable to 
find jobs immediately in the occupations for which they have 
been trained. There are several reasons for this unhappy pros¬ 
pect. Tlie war-time and post-war shortages in a number of occu¬ 
pation® have now been filled, The unprecedented numbers of 

& 
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new graduates will intensify competition for jobs Furthermore, 
there will probably be somewhat fevvei job openings for new 
college giaduates in 1950 than in the fiist post-wai years or 
even last year 

The Nation’s economy is cuirently operating in high gear, 
and it is likely that employment will continue at about the 
present high level foi the lest of 1950 However, unemployment 
may mciease somewhat, since the Ameiican labor force (includ¬ 
ing both employed and unemployed) is growing at the iate of 
600,000 to 700,000 woikeis a year This situation presents a 
challenge to business and industiy to utilize fully our increasing 
supply of potential woikeis, pioduce mole goods and services, 
and bung about a rise in oui national standard of living In the 
longiun, I am confident this goal will be achieved; but m the 
next year, the atmospheie in which college graduates will be 
seeking Jobs is likely to be less favorable than at any time since 
the war 

Such general obseivations about conditions in the job market 
obscure widely varying situations Piospects are excellent in 
some occupations, though, in others, giaduates will face stiff 
competition for jobs. 

In teaching, foi example, theie is at once an acute shortage 
of pel sound in the elemental y schools and a growing ovei- 
supply at the high-school level Foi the cm rent school year, 
only one elementaly teacher was tiallied for eveiy three who 
were needed On the othei hand, 4 times as many students 
completed Gaining foi high-school teaching as weie requned. 
Tins imbalance m supply exists in neaily eveiy State, cieating 
a giave pioblcm both for the schools and for the young people 
concerned, College counselors can help to lemedy the situation 
by getting the facts on employment-outlook befoie piospective 
teacheis as eaily ns possible in their college careers 

Othei piofessional fields in which stiff competition for jobs 
is expected in the next few yeais include 1 

Law This profession is aheady overcrowded and likely to 
become more so during the next few years. Twice as many 
lawyers passed the bar examinations in 1949 as in the years 
just befoie the wai; unprecedented numbeis are currently em 
rolled in law courses 

Engineering In the early ipjo’s, the number of graduates 
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will evtwl thr number of openings in this rapidly growimr 
profffi'ii-n IWcvcr, after the next few years, the employ, 
mrnt filiation for new graduates is likely to improve r 
Ckcmntry CMiipUitiort fur portions will be Keen during the 
rrtst few ycar* 9 among; tjieumrs without graduate training 
1 hr oullM‘k h letter for ih<nc with graduate degrees 

%itrmtlum Hie refuting field, always highly competitive 
i<t Italy to Ircomc more overcrowded in die early 1950 's, Jobs 
will he cawr to g<rt with country papers, trade papers, and 
hoiwoigam lhaii with "dailies " 

Personnel mrk Competition is very keen in this field, Em¬ 
ployers arc inkling on much higher educational and personal 
qualification* for position*! Ai all levels than in the previous 
five or ms 

There will probably aho Iw an ovcrsupply of hsbiess at} 
ministration graduates A surplus of new graduates has already 
developed m the field of accounting 
Ukral arts pad nates with specialised training or work 
experience will find it easier to get jobs than those with only a 
general undergraduate education 

Fields offering good prospects for new entrants include; 

Nursing ■ A shortage exists despite the fact that there are 
more nunn than ever before. The demand for nursing service 
will probably con tunic to rue, 

Mediant ami Dentistry. 1 hose able to cnier and complete 
training will have good opportunities, However competition is 
very keen for admission ui professional schools. Some new 
schools nre opening, more are planned fur later in the decade 
Pharmacy, Tins is a field m which the supply of new gradu- 
ales has almost caught up with the demand, It n* expected that 
this profession will be overcrowded in the long-run if enroll 
merits in pharmacy colleges continue nt present high levels, 
Other occupational groups important in health service, such 
na Werhttn iatis % medical x-t ay techmeians , medical labotatory 
teefmicions, dental hygienists, physical therapists, occupational 
therapists, and dietitians are expected to have good opportu¬ 
nities for a number of years. Women with interest in the medical 
field will find many openings in most of these occupations 
Social work. Current employment opportunities nre excel¬ 
lent in all types of positions. IJic long-run outlook isi good lor 
workers with graduate training, but those with only under- 
graduate training will face increasing eompctitition, ( 
Psychologists with graduate training, particuhnly m clinical 
work, will find good opportunities in the next year or two 
However* those with only the master's degree <*PJr 
increasing competition, Some psychology majors with the 
bachelors degree are having difficulty gaming admission t 
graduate training. 
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Many 1950 graduates who have taken training for occupa¬ 
tions that are, 01 soon will be, overcrowded will need your 
expert help Jn adjusting to the situation. 

For some, the best course may be to take a job in a related 
field Thus, many engineering gmduates may be able to put 
their training to use in administrative or technical sales jobs 

Koi others, the wisest course will be to continue in school for 
postgraduate work m the same or related fields, in order to 
impjove their chances for employment This is in line with the 
long-term tiend towaid constantly rising standards of educa¬ 
tional preparation in many occupations. In engineenng, for 
example, many people with little, if any, college education used 
to qualify foi professional positions on the basis of their practi¬ 
cal experience. Now, it is much haider to da this, most openings 
in the piofession are filled by men with bachelor’s degrees, and 
the number of engineers with giaduate training, although small, 
is increasing The same trend toward graduate training can be 
noted in many other professions In addition, the pioportion of 
sales, clerical and administrative occupations for which a college 
education is lequired or preferied has been growing rapidly 

Job opportunities in piofessional and administrative occupa¬ 
tions may be somewhat better for graduates who come out of 
college a few yeais hence, after the current peak in college 
giaduations has been passed Employment in the professions 
lias grown rapidly—from 3^ million m 1940 to over 4 million in 
1949. It may well increase to more than 5 million by i960 
Employment in administrative occupations has likewise shown 
an upwaid trend in addition, many new graduates will be 
needed yeaily to fill vacancies arising because of death, retire¬ 
ment, marriage, or transfer to other occupations, probably more 
will be hired as replacements than new Jobs Nevertheless, if 
college enrollments increase in line with past trends, there will 
continue to be keen competition for positions in most profes¬ 
sional and administrative occupations. This will be even more 
true if enrollments expand as much as has been recommended by 
some educators. 

Since opportunities will be better in some fields than others, 
students will need realistic information on employment pros¬ 
pects in different occupations; they should have this before 
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(hey enter an n course of training for any field This itifarmv 
ehui ncd Is lo be up update During the past 8 months the 
Bureau li n Item working on a new edition of the Occupational 
Oufhok HmhiHuk The inform it ion we luve obtained—fmm 
mdustn nr« iin/rd labor, .uni professional societies in a great 
number of fields iiikIci scores die fact that the factors affecting 
cmphniwm trends arc umstantly changing 
( allege prrsnmicl workers can, nf inurae, do much to see that 
ynuny people have the needed information available to guide 
them m making an oaupihoml choice They can also coa 
tnlmie greatly to a .solution of the broader problem of over- 
crowding of professional and administrative occupations, by 
hdpum suidctus to widen their vocational horizons and encour¬ 
aging them to Mrck employment m a broader range of occupa 
lions. 
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An Abstract 

HAROLD E SNYDER 

Director, Commission in Occupied Arens, American Council on Education, 

Washington, D C 

Why should American educators be particularly concerned 
with educational developments m Germany and Austria, m 
Japan and the Ryukyus? What stake, as Americans and as 
college personnel officers, have you in the leconstruction of the 
ex-enemy countries and in the rehabilitation of their youth? 

The answer is a simple one It consists of three main points 
which I believe to be iriefutable 

Firsti the time has passed if it evei actually existed, when 
the well-being of American youth can be assuied by the op¬ 
portunities piovided m our home and communities, in oui 
schools and colleges Foi thousands of our students two ter¬ 
rible wars and the threat of a third even moie terrible have 
wiped out and can wipe out again all of the benefits of our 
excellent educational system, all the splendid advantages with 
which we are trying to piovide them Developments in other 
parts of the world are of direct and vital significance to all 
of us, and pai ticnlarly to our youth 

Second , while the happiness and security of American youth 
depend upon many factors, it is particularly essential that a 
concerted effort be made to overcome the effects of the per¬ 
verted Fascist philosophy and education on the minds of Ger¬ 
man and Japanese youth These virile and technically adept 
peoples must not again be permitted by our disinterest to be¬ 
come sources of infection, infestation and eventually of agres¬ 
sion affecting the whole world 

Poverty and unemployment, frustration and disillusionment, 
indifference and indecision can once more cause Gentian and 
Japanese youth to be attracted by the blandishments of to¬ 
talitarian propaganda, can turn their despair into hatred, can 
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make them a threat to the security of American youths En 
lightened selfdntcrest demands that we be concerned with aid¬ 
ing. the process of educational reorganization and recot\stiuc 
nan and of democratization in the occupied countries 
Third, World leadership has been thrust upon us, By the 
very fact of our w uipuiion of the ex-enemy countries, we have 
assumed a very special responsibility for wh.it happens there 
In the eyes of the entire world Germany and Jnpnn are proving 
grminth for the democratic principles which we profess, for the 
efficiency of our methods, for die sincerity of our motives We 
dare not, therefore, full into the sometimes tempting illusion 
that these countries cun he given identical and equal treatment 
with all other countries with which we maintain cultural iela 
lions. The question is not one of favoring our former enemies 
Jt is obvious that they must not he coddled But it is equally 
obvious that if wc arc to discharge our special responsibilities 
there, and safeguard our national interests, these countries 
must continue for numc time to come to receive special atten 
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An Abstract 

MAURICE E TRQYER 

Vice President, in Charge of Curriculum nnd Instruction, Japan International Christian 

University Foundation 

Almost loo years ago, m 1852 to be exact, tile United 
States officially through hei Navy, opened feudal Japan to world 
trade, mdustiy, and technology In the 100 years that followed, 
Japan learned hei lesson well, perhaps too well Today official 
United States is again m Japan to help with the democrati¬ 
zation of her schools and government Much has ah eady been 
accomplished in the reoigamzation of education and govern¬ 
ment, much lemams to be done in educating leaders with a 
clear understanding of democratic philosophy and piocesses 

The New International Christian University now being es¬ 
tablished in Japan, independent of the Allied Occupation Gov¬ 
ernment, has as its dominant aim the education of leaders who 
will look upon academic knowledge and skills, not as ends in 
themselves, but as tools useful in working toward 1 (a) the so¬ 
cial older that holds sacied the integrity, woith, and welfaie 
of the individual, and (b) gioup processes of thinking which 
provide the basis for enlightened decision and action but which, 
nevei theless, respect and duly protect the rights of individuals 
and minorities to putsue their objectives through constructive 
educational processes 

The Univeisity will open in April, 193*2, with one under¬ 
graduate college and three graduate schools The major pur¬ 
pose of the undergraduate College of Liberal Arts is to experi¬ 
ment with and demonstrate approaches to general education 
appropriate to the needs and life of Japan Traditionally, spe¬ 
cialization in Japanese education starts at the high-school level, 
General education has been unknown in colleges and universi¬ 
ties of Japan It is proposed that the program of general edu- 

&3 
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ration in 1 Cl- will include nor only natural and physical 
Rientci, *uual m: tentr 4 , and humanities, but also agricuLtut 
and Jiomcmakiiu% nut to prepaid specialists in these two latter 
arcH but to briny the contribution of those two areas to the 
life of the rimpus "I he yradu ue program includes a Graduate 
School oJ bduvadon, a Graduate .School of Citizenship attd 
Public Administration, and a Graduate School of Social Work 
to prepare leader* for three ureas of public service, education 
government, and social mtvuc 
O ne of the vuc presidents of this new university is to ad¬ 
minister and inordinate the student personnel stream ofac- 
li\ ity recruitment, selection, admissions, registration, orienta 
lion, vocational and educational counseling, clinical services on 
problems of social and (.motional adjustment, health services, 
housing, student social activities, placement and follow-up. 

rduutional leaders in Japan have dccluied that there is m 
one among the colleges of Japan qualified lo handle this posi¬ 
tion It will, therefore, be filled by a highly qualified faculty 
member from one of our leading universities in the United 
States, who will also head up I he piogram of graduate naming 
in personnel ami guidance This position holds unusual oppor 
tunnies for pioneer wmk m the development of new programs, 
processes, and techniques of guidance in a different, but cer 
tainly nut new culture and language setting- 
Finns for the development of the piogram for die university 
arc as follows The beginning faculty, in April 1952, will con 
sist of about sixty staff members. The major function of the 
new institution is graduate in nature. Since die major function 
of tins University is graduate in nature, faculty members will 
have completed their doctorate progiam and at least three- 
fourths of them will be persons of recognized status in their 
field About half of the faculty will be Japanese, the other half 
from other countries, 

A number of the non-American members of the Faculty are 
to be selected and (nought to the United States on fellowships 
for Lhe academic ycai 1 950“-5 1 . About 40 of the faculty mem¬ 
bers, half Japanese and half foreign (non-Jnpnncse), are to be 
selected by June, 1951, at which time they will be brought to 
the United States and assembled on some university campus 
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!oi seven months of planning This planning session is impot¬ 
ent A new faculty ^presenting diffeient institutions, coun- 
tnes, and cultines will need time to think together on the ob¬ 
jectives of this umveisity and to build progiams and courses 
which suppol t those purposes A new system of student records, 
a libiaiy, new equipment'-these are all problems far study and 
development 

In the achievement of these pm poses of the planning con¬ 
ference, the faculty will have an unusual opportunity to learn 
ways of demociacy and Christian Brothel hood in their own 
peisonal lektJonships This is indeed an impoitant peivasive 
objective of this planning penod The Faculty of this new uni¬ 
versity should not unduly confuse then students by discrepan¬ 
cies between whal they teach and how they behave in 1 elation 
to each othei 

In Januaiy of 1952, these faculty members, togethei with 
otJieis who will be added in the meantime, will assemble on 
the campus at Mitaka, foul teen miles out of Tokyo, and pie- 
pale to open the umveisity in accoidance with the Japanese 
academic calend.u in Apnl, 1952 

In the meantime, classrooms, ofhees, libraiy, andiesidence 
centeis foi faculty membeis and students will be provided 
though a building program costing about thiee and one-half 
million dollais Tluee bundled thousand dollars have been bud¬ 
geted foi new books and magazines for the library, moie than 
$joo,ooo foi equipment Financial plans for the umveisity pro¬ 
jected by the Japan International Chustian Umveisity Foun¬ 
dation in America piovicle for a leserve of $5,000,000 to be 
used as general endowment and a substantial sum to subsidize 
fellowships and scholarships 
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SOME STATISTICAL PROBLEMS IN CLINICAL 

RESEARCH 1 

ROBERT R HOLT 1 

The Mcnnmger Foundation, Topeka, Kansas 

The title of oui lound table carnes the implication—in¬ 
tentionally, I believe—that clinical research has special aspects 
which make it somewhat different fiom most psychological re¬ 
seal ch and which pose certain statistical pioblems with unusual 
urgency Let us first, therefore, take a look at the kinds of 
things clinical reseal ch has traditionally referred to 

The ouginal meaning of the word chmcal, I am told, was 
bed-side The clinical practitionei was the practical man who 
dealt actively and most clnectly with patients, and thus had to 
sharpen his sensitivities to everything about his charges that 
might indicate movement toward sickness or toward health 
Research and practice weie almost indistinguishable aspects of 
the same role, for the keen obseivation that noted similarities 
in different patients led both to the development of individual 
theiapeutic skill and to die slow amassing of shared knowledge. 
Thus, to call leseai ch "clinical” is to imply first that it has inti¬ 
mately to do with the active dealings of doctors and patients, 
and that it pieserves the clinician's passion for richness of con- 
ciete detail 

As clinically oriented reseaichers began turning their atten¬ 
tion to people who were not obviously ill, it became accepted 
that research may be called chmcal even when it does not deal 
with patients The tiue physician’s approach, which looks to¬ 
ward the whole man in his unique individuality, and the luer- 

1 A somewhat shortened version of this pnper was presented At the Symposium 
"Problems of Stntisticnl Method ill Current Clinical Research/’ jointly sponsored 
by the Psychometric Society imd the Division of Clinical And Abnormal Psychology 
at the APA meetings, September 1947 flic other participants were P J Rulon, R M 
W 1 ravers, and Daniel Horn; E L, Kelly was chairman 

3 The ideas m this pnper were worked out in many discussions with colleagues of 
the Research Department of the Mcnmnger Foundation, to whom I should like to 
acknowledge my heavy indebtedness I nm particularly gratified to Roy Schafer and 
George S Klein for their assistance find criticism 
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<mh) iff wnnhe values, nhnli puts fidelity to life, keetn s r 
nbvrwimn And adequacy of concepts to deni with !i Um ° 
problems bdnre the mruc conventional canons of quantitative 
precision* objectivity, ready dcmunstrability, and control^ 
conditions thc^c taint* to be connoted by the word clinical It 
n m tin* meaning that the term ehnttal meatch is used here 

Piainlua! Problems implied tti Thee Pnncipal Poms 0 / 

Chwcul Research 

Historically the/ m/ kind af clinical research m psychiatry 
and psychology followed the lines set down in medicine, P a - 
uctm wlicifbC symptoms were grossly similar would be observed 
and tlu* findings collated from this kind of research in pay. 
chnury\ men like Kraepclin reduced the bewildering variety of 
concrete mumfesutiems of mental illness to a comprehensible 
schema The resulting nosology may be creaking and inade¬ 
quate tti the demand* made nn it today, but it was ,\ real 
achievement of cIiiik, d research in ih time 

Kescareh of tins kind has never ceased It is a basic kind of 
inquiry into rln* nature of human beings, prerequisite to ad¬ 
vancement in other kinds A psychiatrist who has the good for 
tunc to observe a succession of fetishists, for an example, will 
publish ft nummary of the common features of these cases For 
research of this sump, the bmest mathematical staples will 
suffice if any at all are needed' the simple arithmetic of sums, 
ranges anti averages Kvcn such a refinement as a standard 
deviation would he out of pljce if you were working up your 
observations on m cases of some unusual ailment, such as true 
paranoia 

A second type of clinical research is modelled on the medical 
experiment of treating patients with a cejtam drug and com¬ 
paring their subsequent status with that of an untreated or a 
differently dosed group When we wish to establish the precise 
effect of any experimental condition (let us keep leucotomyin 
mind ns a concrete instance), we get right into the complexities 
of experimental design, All the muni means of demonstrating 
significant differences become relevant, notably /-tests for coni' 
paring means or du-aquaie for comparing frequencies of scores 
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on tests pre- and post-operatively Essentially similar m foimal 
structure are problems of diffeiences between various clinical 
groups with lespect to experimental criteria, 

How do investigations of this kind diffei from the agricul¬ 
tural examples that lend Fisher's books what bucolic flavor 
they have? When diflerent fertilizers are applied, or different 
strains of wheat used, theie may be many variables which need 
to be controlled, just as in the psychiatric model soil constit¬ 
uents, rainfall, exposiue to the sun, in place of length of hos¬ 
pitalization, premorbid intellectual level, or presence of organic 
illness But how much easier it is to measure the yield of a 
patch of wheat than the degree of recovery in a gioup of leu- 
cotomized schizophrenics I Of course, one can be concerned 
about the wheat's height and the volume as well as the weight 
of the harvest The falteimng of these variables, however, is 
not likely to be considered significant In the psychological 
experiment, by conti ast, no one test score or clinical rating 
alone is crucial, noi can it stand for the total effect, the change 
in the configuration of such data must be analyzed But how 
does one test statistically the significance of a difference in 
configuiation? 

A third kind of clinical leseaich goes a step beyond the estab¬ 
lishment of differences, and tiies to discover functional rela¬ 
tionships, Variables are assumed to exist on both sides of the 
equation; the reseat diet tiles to obseive what happens to his 
criterion (let us say the Rorschach test) as successive increases 
in a human charactenstic (such as anxiety) are studied In such 
a problem the psychologist thinks first of col relation, though 
ideally one should work out the form of the lelationship and 
fit a curve of some kind to it also, But, again if the true nature 
of clinical research is to be respected, the problem cannot often 
be so easily disposed of Typically, the covariates will be a syn¬ 
drome oil the side of the patient, and a pattern of test results 
on the side of the criterion What tools do we have foi this kind 
of pioblem? Multiple and paitial coirelation? All very well if 
we aie inteiested in only one vai iable on one or both sides, and 
if no discontinuities appeal, and patterning is unimpoitant, 
Better not to have too many variables, however, because the 
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umomhuI lalwir ttrH demanding and the formulas most 
complex, while if relationships Are (.unilinear, the sameobjec 
linns hold moH friruhh 

All nf this talk about swulrnmcs ,iml patterns points to one 
cardinal tliHcrcncc between chmul and classical experimental 
rF*ie,mh It ts taken for gt anted ftnm the bcgmmng in chmtal 
Uicateh that the signrfiranl data me found tn meaning] id pat. 
term, and that fhu paffrttmig must he us petted Whethei the 
timbal jmtdinhusiM ni pvulnalrist is working with a brain- 
injur<d patient ipr .1 normal pcnoiulicy, he has to consider the 
tniuliiion nl his subjects .is organisms, as people, and he has to 
keep this totalri) in miml all along 
Two mipoi tnnt consequences follow 1*list, the nature of con- 
troln has to be quite different from iIul m the classical experi¬ 
ment; mcmd, the techniques of analysis must enable the 
experimenter to deal with a shifting configtiiation of many varia¬ 
ble or parameters, which may undergo “phase changes” or 
the emergence of new patterns in a discontinuous fashion 

Pt oh terns of Controls tn Clinical Reseat ch 

It is generally actq‘red that investigative science is essen 
tiallv a matter of milking observations under conditions over 
which the investigator has sufficient control to see causal rela¬ 
tions clearlv. Classically, this control has meant the holding 
of all important variables m a situation constant except one, 
the experimental variable, and then recoiding the effects of 
var)ing it cm the criterion being measuied or otherwise accu¬ 
rately observed Such a design makes for clear, crucial experi¬ 
ments, it is easy to treat them statistically or even to derive an 
empirical function from the results I bus, if we are interested 
in the properties of springs, we may first study the effect of 
various weights in stietching a spiing, holding constant the 
nature of the spring used, the temperature, magnetic fields, etc. 
Having established the empincal equation, we may then hold 
constant the weight used to stretch the spring, and vary tem¬ 
perature, observing accurately the extension caused by, say, 
50 grams, at io°, acA 30° C. and so on—leading to another 

equation. ( . , 

A recent trend in psychological research is the use of lactoria 
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design Recognizing the facts of interaction between psycholog¬ 
ical variables, it enables the experimenter to measure them by 
means of designs in which there is controlled change in more 
than one variable It is a highly efficient type of design, per¬ 
mitting as it does the investigation of more than one possible 
cause of an effect in which we may be interested, as well as 
interactions between causes, where applicable it is to be pre¬ 
ferred ovei the classical univariate design It still lequires more 
by way of direct manipulative control than the clinical re¬ 
searcher often lias at his disposal, however One must be able 
to arrange things so that given values of two or more variables 
or conditions will be attained simultaneously. It also has the 
disadvantage of limiting oui attention to a single numerical 
score or index as a unit of analysis, out of several possibilities 

In much clinical reseatch, not only can we not hold all rele¬ 
vant conditions constant except one, we must accept whatever 
variations occm, powerless to arrange them neatly beforehand 
Since we cannot simplify our task by the usual means of con¬ 
trol, we seek the contiol that is given by as exact knowledge 
as possible of the values of the uncontrolled variables, as we 
find them In studying the effects of ego-involvement on levels 
of aspiration, the clinical researcher knows that he cannot in¬ 
sure that the many facts of personal history that may affect 
the criterion, statements about goals, will be the same for all of 
his subjects. Consequently, he tries to make the best of a bad 
situation and find out as much as possible about the people who 
are his subjects He finds himself dealing with a complicated, 
if not tangled, web of mter-related factors, particularly if he 
chooses to observe more than one criterion aspect of behavior 
And it does tend to be characteustic of clinical research to woo 
complexity of this kind too 

Of course, the fact that the subjects of study are human 
beings, often sick ones who are looking for help, dictates that 
human considerations must come before logical and mathe¬ 
matical ones We cannot manipulate people in important ways 
just to help along the nicety of our controls. Furthermore, very 
often when we try to do so—when we cieate artificially "sim- 
ple M situations, for example, wheie we may fancy that control 
of the classical kind has been attained—we find that we are 
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dealing with tltjfcrm subjects, who are reacting aHifcialiy ^ 
an ainfitwt sci-up When uc try to experiment in this way on 
any uf rhe really important aspects of human emotional ]J C 
\ic arc likely to treate .in awkward anti false atmosphere.That 
is pari of the meaning of the statement that existing configure 
linns have to he respected in clinical research 

Problems of the Stathttail Analysis of Complex Data 

T he dear confluence to time much more subtle arid devious 
*uti$tical methods than usual arc needed if we are tq try to 
unravel the causal nexus, to remove the effects of inescapable 
confounding of variables It is out of the question to report 
clinical research of this character by the formula for an empiri 
cal function Most of the techniques 1 am familiar with, such 
a* multiple and partial correlation, analysis of variance and 
tovanaiuc, seem m offer promise and yet to bring up serious 
difficulties 

A t){ffinf/fia Atktuiwfr the Use oj Famthm techniques 

Tu begin with, (l) (lie nuinbcjs of subjects are inevitably 
limited m any resc*mh dial seeks to gauge the important as¬ 
sets of cadi subject's personality Wi th few degrees of freedom 
to work with, the more complicated toncl.UJon.il methods be¬ 
come unusnhlc The attempt to get larger N’s through lumping 
together essentially different groups, or skimping on the thor¬ 
oughness of study, destroys precision instead ol increasing it 
(It is true, the trend seems to he toward cooperative clinical 
rcseardi Many hands can ofLen get together sufficient data 
on rather large numbers of cases, for many kinds of clinical 
research, j/personnel and money are no problem ) (2) Problems 
of non-linearity often vex correlational analysis- (3) Turning 
to the analysis of variance, the experimenter is all too likely to 
find that his data arc not homosceclastic -variance is not uni¬ 
form enough throughout the tables of results for ns complex an 
antdysto us he wants 3 The end result may he that the researcher 
whose statistical sophistication extends no furthei than mine 

1 During tins round ulilc tlutcuiMoti, Dr Phillip Union tupBestcd m reference to this 
point that n transformation of ito tint it (« bv converting mem into logwttnmsj oiten 
overeoiMH tlieir li etc rowed iwiicHy unci nukes tins anmlym of variance nppiicnoi 
without iifTcctuig P values 
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may find himself using simple methods anyway, willy-nilly 
At the same time he will recognize that he is losing much of the 
richness of his data, but not know what to do about it What 
js worse, he may feel impotently aware that many of the dif¬ 
ferences or lelationships that he obtains may be exaggerated 
or underplayed because of the confounding effect of other van- 
ables, the effects of which he was not able to remove oi hold 
constant 

Not all clinical research fits this model, of course Some of it 
deals with such complexities that measurement of any kind, 
seems futile or impossible, for example, psychiatric research on 
the dynamics of a neuiosis At other times it more closely 
approximates the classical experimental model But the hall- 
maik of clinical reseal ch, generally speaking, is that it tries to 
deal with complex subjects in complex, more or less natural 
settings 4 The geneial statistical problem of most clinical re¬ 
search, then, is roughly the same how to deal with highly pat¬ 
terned, interrelated data, 

Let me give some more specific examples of this kind of sta¬ 
tistical problem I have just been working with the problem of 
quantifying self-insight, Even though I had self-ratings and 
criterion-ratings on a large numbei of persotiological variables, 
and though something about then patterning on each subject 
was available from case studies, I was forced to use a summative 
atomistic measure of insight Aggiession might be the most 
ciucul problem foi this man, and relatively unimportant for 
that one, but it had equal weight with othei needs, some of 
which might be quite tuvial in the lives of both Should a sys¬ 
tem of differential weights have been used? It would have been 
clumsy and approximate and would in no way have expressed 


4 Here some note has to be taken of the many thorny problems contained in the 
easy words simple nnd complex As a rough first approximation, let me offer these 
considerations (1) Something is simple if it requires n few concepts or few coordinates 
to describe its structure, complex if it requires many (a) An event is simple if it m 
be cxplnincd to n certain margin of error by n few determinants, complex if you need 
to isolate many determinants in order to reach the same precision A further signifi¬ 
cance assumed here to be implied by complexity is the number of relationships between 
the parts of a whole, and their degree of order (hierarchy, symmetry) 1 hnvc not in¬ 
tended to imply chat there is nny constant relationship between the concepts simple- 
complex and aittJicifil-Mluitil It should he apparent, also, that simplicity is always 
relative to one's purpose and approach, n rose may he simpler than a symphony to an 
artist, more complex to n physicist 
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the patterning of the needs in cjtch man their fusions, conflicts 
flufotriidlioM or hiernrchial relationships 
Consider a simpler problem expressing the degree of match.- 
m$ nr agreement between two senes of simple patterns, i\ 
soinai«fs|w \n a simple pattern of three numbers, each number 
representing the degree to which one of three components of 
physique M present Two somntotypera make independent ra¬ 
tings of a group of subjects, and wish to find out how reliable 
rheir judgments are Ihej can, ol course, correlate each com¬ 
ponent separately, and if the correlations are all plus one, the 
patterns must be in agreement (excluding constant bins), With 
lower but still quite respectable separate reliabilities for the three 
components there can be quite a Jot of important disagree¬ 
ment over which component is dominant in any one physique, 
There is a good deal more differcmc between n 2-4-5 and a 2-5-4 
than there is between 2-4-5 and a 2-3-4; but how can such 
very simple pattern diflcrciucs be handled statistically? 1 

When we deal with a much larger array of numbers, such as 
are found on the foinml puydingram or summaiy sheet fora 
Rorschach test, or the sulucst scores of the Wcchsler-Bellevue 
scale, how much more hopeless it seems to try to relate such 
patterns to anything, mathematically! Vet one important kind 
of clinical psychological research, the validation of tests, has 
to approach these data from n configurational point of view 
Consider one of ihe simplest problems in Roischach validation, 
since n single criterion, IQ, is involved: the estimation of intel¬ 
lectual level. The books tell us that tile number of whole re¬ 
sponses, especially well-organized ones, the number of move¬ 
ments and their quality, the accuiaey of form perception, and 
the number of good original responses (among other things) 
arc positively related to intelligence But the patterning is such 
a crucial matter that, as KJopfer remarks, only the F+ per 
cent gives an appreciable correlation when these factors are 
correlated with IQ The most subtle application of multiple- 
regression methods would not help much, either, since there 
are so many ocher things chan intelligence which can affect any 

* Dh Rulon sidd in ihc centra* of the discussion ihai multiple correlation (Rim w 
would hwidlfl tins problem udettimiely, iIwwrIi u wnii tlint, even so, one must nw 
no constant differences Iwtween imra 
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of these vanables Yet judgments of skilled analysts of the test, 
taking the above and many othei aspects of the record into 
account m their inteirelations and mutual implications, agree 
fairly well with intelligence tests 

B Special Difficulties in Validating Psychological Tests 

When we turn to the validation of diagnostic conclusions 
from any one test, or especially a battery, the statistical prob- 
lems pop up from behind every test score 

There are a number of impoi taut problems that may properly 
be called pre-statishcal First of all, there is the group of very 
vexing problems of reducing clinical data to quantitative form 
in which they can be handled by statistics Few research pro¬ 
cedures in the field, even including tests, result directly in 
meaningful numencai indices Ratings are often resorted to of 
necessity, with all the headaches that they bring The principal 
statistical problem involved is getting numbeis which mean 
something more than a denotative or ordinal scale 

Second, it must be recognized that test patterns do not indi¬ 
cate directly the presence of particular psychiatric diseases 
Rathet, they reflect disciete aspects of peisonality, of intellec¬ 
tual functioning, of thought organization Therefore, they must 
be validated against these aspects rather than against a notori¬ 
ously unreliable nosology If, on the contrary, we were to follow 
the advice of those who urge the validation of Bellevue scatter 
analysis through multiple correlation, not only would our diag¬ 
noses be bare statements lather than pictures of personality 
under the effects of illness, but we should be tied to the nosology 
on which the original validation study was done There has 
been for some time a current of growing dissatisfaction in psy¬ 
chiatry with the standard nosology; psychologists would be 
setting their faces toward the past if they ignored this trend 
We must try, then, to iealn the relationships between our test 
data and the elements of mental disorder (such as anxiety, pro¬ 
jection, psycho-motor retardation) which are variously pat¬ 
terned m different nosologies We need differently designed 
vahdational studies, which will tax more severely the kind of 
clinical collaborator for whom diagnosis is only label-giving, 
but which will be easier for the good clinician, who is usually 
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fiiirrr of Jm tifamaliiMs of stub phenomena as tension andob 
^mvc thinking Hun lie is of the diagnostic pigeonhole into 
u hah he has to cram them 

In Hie third place, there is the fact that validation^ studies, 
if done only by the use of clear-cut cases, do not tehevt the 
tluwiam of Hie need for considerable ingenuity, and artistry 
if jmi will, when lie comes to grips with the mixed picture that 
the cvrrjd.iv ia*ve prcwiils He may he able to find in the book 
the IlcllcvM* matter puterna to he expected for intia-crnnial 
pathulug). for Imamu, and for sdi i/oph renin J3ut how is he 
gumg to Ikt aide to use tlirm m the diagnosis of n schizophrenic 
procew developing in a person nl hs.clerical diameter make-up 
utter trauma to the central ncivom system? Variations m the 
test Mores of most patients can lie attnbuled both to present 
and to pre exiting modes of adjustment When patterns 
are imptwd upon puticrn*. m these wavs, can one not be for 
given lor despairing of the possibility of diagnosis til lough nuil- 
uple regrcviion, or even of Hie possibility of evci fully validat¬ 
ing everyday diagnostic use of tests? 

1 he fact n that Hut marvelous unconscious statistician, the 
human brain, can learn to separate such nvei lapping patterns 
and make «crisc out of them It is for this reason that the clinical 
psychologist unabashedly j circs oil what cxpcucncc teaches far 
more than on what can be dcinonstiated statistically to the 
satisfaction of meticulous methodologists. 'Experience can 
take into account the crucially important qualitative analysis, 
what Rapnport tails "the tune of lIic record/ 1 which, by its 
very nature, can hardly ever lie tidiced mathematically. It can 
take advantage of unconscious learning, the effects of sub¬ 
liminal recognition of cues the natuic of which the clinician 
may he unaware Even though that kind of so-called intuitive 
or artistic diagnosis may not directly contribute to science, and 
though it is difficult to pass on m otheis, its successes msl occur 
hfote (hey am be subjected to systematic study and their bases 
finally discerned Perhaps, then, the cxpcncnccd clinician may 
be forgiven for an attitude toward statistics which °Jten^ap¬ 
proaches the condescending. He knows that the methods 0 
diagnosis his experience has taught him are under a_c°ns an 
validating check, their agreement with cJnucril data He now 
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also that statistical researcheds can give scientific respectability 
to some of this knowledge, bit by bit, following at a considerable 
distance behind, but he has yet to hear of any contribution to 
diagnosis that was made by statistical research before it had 
been discovered by the woiking clinician 

But let us suppose that these piestatistical hmdles have been 
succesfully leaped, and a study is under way to validate certain 
supposed test indications of schizophrenia Let us suppose, 
furthei, that the cases have been divided clinically in the way 
just lecommended according to aspects o! the disintegration 
of control ovei thought and emotion, for example We come 
right up against the fact that diffeient indicatois mean partic- 
ulai aspects of schizophiema in diffeient cases In one patient, 
the chaotic lesponse to and use of color m the Rotschach test 
may indicate abandonment of a laige degree of emotional con¬ 
trol, while the TAT is idatively stereotyped, in another case, 
the Rorschach may be quite coarctated, giving no hints about 
the emotional status underneath the surface inhibition, while 
wildly aggressive and sexual fantasies in the TAT indicate again 
the deterioration of contiol over emotion Furthermore, one 
cannot reason diagnostically from the lack of a sign beyond a 
limited extent, just because a wealth of distant word associa¬ 
tions indicates schizophiemc disorganization, an orderly asso¬ 
ciation test does not necessarily piove the lack of such dis¬ 
organization 

Why are these statements true? Certainly not because deter¬ 
minism in mental life is lacking or capricious Rather, it seems 
to be that the stiucture of human organism, particularly of its 
psychic aspect, is an exceedingly complex matter of checks and 
balances If A gives, but B and C hold, then no matter, again, 
no matter how strongly B and C stand fast, if D goes, then all 
is lost. We know in studying patients' histones, that a particu¬ 
lar trauma, such as the accidental death of a parent in front of 
one’s eyes, gives eveiy sign of being directly pathogenic in one 
case, while this and other tinumns may be piled on anothei, 
constitutionally strong peison, or perhaps one with a good in¬ 
fancy, and only a slight degree of pathology occuis 

It seems to be doubtful, then, that in clinical research in 
etiology or anything else psychological, Koch’s postulates can 
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be "the ublenof ihc liw" as Dr. lCuhic s-iid in the 1946 Ortho 
p^diMlnt Round Table oil Clinical Research 4 These postu 
Lues, }<m %mI 1 rememte, arc "0 we will have to find * the 
suspected cause m every patient suffering from a specificdis 
ease; a) this a must be found in such patients only; and finally, 

the fx|*crnnental introduction of a into the host must pro¬ 
duce Lhe disease M Dr Ruble believes that the first two princi¬ 
ples arc applicable to mental disease, even though determinants 
of human behavior arc manifold, and though it is difficult to 
drotingufth m psychiatry between what is the essential content 
of a di»Mc and what is saprophytic or secondary content 
Large numbers of caw will clear these matters up, he believes, 
But, consider the well-known hypothesis of Freud, that homo- 
sejcud conflict* arc a specific etiological element in paranoid 
projections Grant for a moment that it is possible to satisfy 
the first requirement, and that tins suspected cause can be 
found m every* paranoid case (although theie fire many clini¬ 
cians who will not grant its universality) It still does not follow 
that Immocrotic conflicts arc found in such eases only, or that 
where such ti conflict is induced, the paranoid symptoms al¬ 
ways occur The issue cannot be solved by objecting that It has 
to be just one particular kind of homosexual problem; almost 
all degrees of acceptance and awareness of these forbidden im¬ 
pute may be found in paranoid patients at the time the symp¬ 
toms develop 

It seems that we must reformulate the principles laid down 
by Koch in the light of modern conceptions such as the one 
championed by Beliak in Ins icccnt survey of etiological theo¬ 
ries of schizophrenia T Ife maintains that schizophrenia is a 
reaction type, a kind of syndrome which may be brought about 
by a mixture of organic and psychological determinants in any 
proportions, fiom the purely psychogenic to the purely soma¬ 
togenic. Certainly, there must be a similaiity between this and 
the basic position taken by the diagnostic tester, that any of a 
known but very wide range of test patterns can be indicative 


of a pa rticular disease 

"" ‘Bnummi, M, (Chairman) ft at "Problems in Clinical Research." ^ trim 

arts jwa *«-» w- 

a fttvtno and Evaluation* New York. Gnmo nnd Simtton, 1948. Pp. xv., 4 j°< 
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The statistical implications are obvious enough In valida- 
tional lesearch, it may be extremely difficult to get enough 
cases showing any one test pattern to establish its relationship 
to the clinical cnteiion Contaminations in the Rorschach test 
are widely accepted as pathognomonic of schizophrenia, for 
example, yet if one wanted to validate this statement by a 
statistical study, it might be necessary to test hundreds of 
patients before a half dozen clear-cut contaminations were 
found 

The considerations that have Just been mentioned all point 
to the need for statistical techniques which, if not new, are at 
least unfamiliar to most clinical researchers Perhaps we are 
asking for the impossible when we say that we would like means 
to handle the simultaneous variation of many variables, some¬ 
times curvilinear, sometimes discontinuous, and relationships 
between patterns, all with relatively small numbers of cases 
and preferably without the necessity of too much computation, 
please If we cannot have the moon, and if the statisticians can¬ 
not show ns why we do not really need to have it, what does it 
seem to us that we must do? 

Some Suggested Solutions 

A. One direction for research to take has already been men¬ 
tioned That is toward less emphasis on quantification and 
statistics and more on careful observation and the attempt to 
understand what one sees Rather than continuing to apply 
obviously inadequate statistical methods, clinical researchers 
might do much better to concentrate on intensive studies of 
single cases observing in as controlled a way as possible, trying 
to discern meaningful lelationships and to set up hypotheses 
which may be tested when appropriate methods for establish¬ 
ing pioof are at hand It is all too often forgotten that statistical 
methods are primarily ways of proving (or more exactly, dis¬ 
proving) hypotheses and only secondarily means of finding 
something out. The method of Freud and the other great clini¬ 
cians who have contributed the most of our knowledge about 
the kinds of problems dealt with in the field we are discussing, 
was the method of discerning. Kohler offers convincing argu¬ 
ments that causal relations may be as directly perceived as 
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ant (hunt and orgamunu methodologists have taught U5 
th.it flip ■angle t ,w ih ami must he lawful As a method of proof 
it IravcT a good deal to he desired, hut it is certainly a legitimate 
mcthmlnftlmir.il resrarth 

So Hfc.iMs i lie scientific glamor of more experimental methods 
that there m a danger rhat tmilitilc research of (hisphenomeno 
logu.d hind will he done Our dimuniisan training may come 
to think that a couple r»f good courses in methodology and 
tiumtifs under their IndfA are a ttuhsiitute for the necessity to 
keep as diarp, unbiased and fresh an eve ns possible on the 
patient or subject luinxdf The nun of observational research 
is not only in find uniformities and pathognomonic signs Just 
aft much, it must strive to look out for the exceptions, for the 
uncx[wcied and uncut pi.lined deviation from wh.it textbooks 
and picvtrnis experience have told us Great strides in clinical 
psychology and pxxJiiatiy will still he made through the dis¬ 
covery of new puzzles, new effects 01 phenomena to be explained 
as well as through better means of handling familiar types of 
data 

B It is not ufm.tll) necessary in caution clinicians against 
plunging tcK.i abruptly mm quantitative treatment of their data, 
but there is undoubtedly a good deal of research in the field 
that no lie is for this reason. As soon as a method yields quanti¬ 
tative data, (here is a strong temptation to subject them at 
once to statistical analysis. Actually, one can waste a lot of 
time in this kind of thing If he lias not made sure first that he 
has chosen the appropriate lari of aha action on which to do 
the analysis Dr Horn's experiment in the diagnostic process 
is, 1 think, a good example of the second dhection for lesearch 
to take, statistical analysis cm the proper level of abstraction 
instead of tieating the most obvious results/ Rather than try¬ 
ing wt relate the quantitative results of the Rorschach, TAT 
and other icsts directly to diai.it tens ties of personality, he had 
his judges study each test and, using il as best he could, make 
ratings of these characteristics 9 Thus, the complex patterning 


"Horn, Daniel "An Fxperrnienul Siudyof (he Diagnostic Procce r j n the GDrlea! 
luveitigfiiiQn at I’enwnrtluy " Nfirviml Dmvcniiy, iQlJ Unpublished In U ™£ IS ' 
* After hearing die Round Table, Dr Ue Cron bach kindly reminded me of nnom i 
Important type of design which um dmn on this level of gcnernlimion, but witn 
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lemained in the mind of the clinician, who could use the most 
intricate inteiplay of reasoning fiom qualitative and quantita¬ 
tive evidences in order to amve at his Judgments By using 
enough judges, some especially skilled in one method or test, 
others in another, he could ainve at a meaningful assay of the 
usefulness of each test The important research on the selection 
of clinical psychologists under the duection of Dr Lowell Kelly 
is another good example of this approach, on a much larger 
scale Premature quantification, on the other hand, can pioduce 
the kind of sterility that is seen in much so-called “objective 1 ' 
social psychology and sociology, where the easy availability of 
certain superficial kinds of quantitative data has raised false 
hopes of immediately discovering mathematical laws of social 
behavior 

C A thud direction clinical research may take has been sug¬ 
gested by a lecent clinical study of hypnotizabihty, by Roy 
Schafer , 10 With a sizeable and quite mixed group of patients, he 
could find no single test scoie or indicator which had a reliable 
relation to hypnotizabihty When he made blind analyses of 
the battery of tests that had been given each subject and wrote 
out careful and complete descriptions of their personalities, a 
number of cleai and statistically lehable differences between 
the good and bad hypnotic subjects appeared in terms of kinds 
of ego structure and the like This is another example of the 
second direction, quantification on the proper level, which 
shows, incidentally, how statistics may be used in combination 
with the case-study method Here, however, is the main point 
Recognizing the necessity of applying his newly discovered 
cnteria to another gioup (foi "cross-validation”), Schafer re¬ 
peated the piocedure, this time using as subjects a group of 
doctors, all candidates for advanced piofessional tiaimng in 
psychiatry In this quite homogeneous group, not only were 
the pievious findings sustained, but it was now possible to find 
significant diffeiences in particular test scores between good 


the necessity of ratings, mn telling He has since published nn nccaunt of Jus own, prom¬ 
ising extension nnd sharpening of tlic method of mntclung, in his Article, “A Validation 
Design for Qualitative Studies of Personality," Journal of Consulting Psychology, XII 
(i940)i 36J-J74 

]0 A summary of this research wns presented by Dr Schafer at the APA meetings 
in T948 (Abstract in Ameucan Psychologist, III (1948)1 280) 
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and lud subject This * cry striking finding takes us back to th» 
analysis of die kinds of controls that are possible m d m ' lc ^ 
research In the heterogeneous group, the simplicity of re j!|_ 
ticmdnp between an rspenmernal variable, such as a test score 
and the criterion fhypnrmziihihty) was obscured by the maav 
uncontrolled sources of variation in each One kind of control 
- the usual one m clinical work- wan the psychologist's under¬ 
standing ol the significance of these sources of error through 
the total pattern of lest results, so that he could transcend them 
by working at a higher level of integration By working with a ho 
mogencmis population of subjects, the ocher kind of controlwas 
attained* many sources of error were held constant Thereby 
many former wriMs, which could not have been held con¬ 
stant tti&LMtiinlly, became pmamferjt and simple relationships 
on a lower level of iirnilvsis became apparent , 11 The patterning 
was still there, but it was \i\ part implicit, so to speak, in those 
parameters. Whether cite test correlates of hypnotizability 
would Htill appear at another value of the parameters could 
only be determined by further experiment. If they did not, we 
should have a good example of the kind of discontinuity and 
emergence that is met with in clinical data, 

Whiu are the implications of these findings for clinical re¬ 
search? They offer the hope that the need for complex statistics 
may Iw obviated in many cases when the researcher can work 
with highly homogeneous populations, Mather than try to find 
"typical ' 1 Bellevue scatter patterns for each of a variety of 
nosological groups, research on scatter (or any other test pat¬ 
terns) might proceed m the following manner. First, let us study 
a sizable group of obsessive neurotics, all of whom come from 
comparable socio-economic backgrounds Let us take some of 
the outstanding aspects of obsessive neurosis, such as anxiety, 
mtdlectualizntion, ruminativeneas, etc., and obtain quantita¬ 
tive ratings of each symptom for each patient, Then we may be 
nbic to find clear relationships between each constituent and 
simple scatter patterns, such as the discrepancy between In- 

n The aery* In which Ms pair of lerma m used here may be clarified by rcfertitw to 
the experiment wuh ilw aprinp, above When different wwslm arc u«a at a cormwu 
letnpemure, weight u variable, temperature » paramo ter Canvwwly, when the 
amihR , & extension » sMM by nut temperature, h paratntttf t\ thtobtamea June- 
MW is. the value of the cons taut weight uud, 
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formation and Vocabulary sub test scores Then we shall turn 
to a similar group of hysterics, or perhaps to another group of 
obsessives but at a very diflerent cultural level, and so on down 
the line. In this way we should know our parameters, and might 
be able to validate much in scatter analysis that escapes the 
methods that have been tried so far Other applications of this 
general principle need not be spelled out here. It is perhaps 
worth while to mention, however, that it should by no means 
be restricted to dealing with nosological entities 

One further advantage of this proposed technique of reseaich 
is that it would tend to reduce the complexity of findings, As it 
is, the conscientious, scientifically scrupulous man in clinical 
research is caught in a distressing dilemma On the one hand, 
1/ he lespects the subtlety and intricacy of human functioning 
and tries to show it in his work by measuring all relevant vari¬ 
ables, his study gets more and moie nearly impossible to carry 
out, impossible to analyze, and the resulting report impossible 
to get published 01 lead On the other hand, if he oversimplifies 
so that the dimensions of his study are comfortable to handle 
m these thiee respects, he knows that he may be producing a 
caricature oflife Perhaps this method of homogeneous gioups, 
together with the method of working on a high level of integra¬ 
tion, may provide a way out. 

As in the physical example several times referred to, we had 
to lepeat the experiment a number of times with each of several 
kinds of springs, just so in clinical leseaich we shall have to 
abandon the false hope of solving a problem by a single study 
on a single group Pei haps the necessity foi repetition on dif¬ 
ferent kinds of groups might bring about that end of experi¬ 
mental isolation and beginning of cooperation recently called 
for by Fiske in the American Psychologist ia 

How Statisticians Can Help 

May I conclude with one last request of the statisticians? 
Throughout this paper, when I have asked for the development 
of new methods, I have been plagued with the uneasy feeling 
that many of them probably existed and that I would know 

n Fiske, D W “Must Psychologists be Experimental Isolationists?’ 1 American 
Psychologist, I (1947), 23-afl. 
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about [hem if I read Ptythumtinka and the other statistical 
journals Another vmrcc nf mv request comes from the reah- 
/.i[ion tlui rc^ull^ -ill t*«> often depend on the statistical method 
used, ami thaif many clinical rcx-carc hcrV ignorance of a \ vlc | e 
v.incl v nf methods makes them unnecessarily rigid and limited, 
There are certainly many boohs on sMtiMics already available 
hut far I have nm found the one thar I want, and 1 wondci 
if dfiinnr has irnrd to vvrile it 

u 

Jf would not lie a [ext, liut a handlmok of statistical methods 
for clinic d and tuber psv< Inilogua) research u inkers It would 
he n *\&<enidtii conipihstron of lormulas and methods, grouped 
under the usual headings (Measures nf Dispersion, of Correla¬ 
tion, etc 1 With each Inrmula, there would he the following 
information 

Ci ) llw assumptions underlying m use, with concrete exam 
pies of what ihrv mean in terms ol psychological data 
(at Nome indualinns ol the degree 10 which each assumption 
can he violated wjih imjnmuv, and what the results of 
violations will lie 

(jjj A discussion of the principles determining the number of 
eases needed 

(4) Some examples of the kinds of problems to which the 
technique is appropriate, with |wrhjpt some examples 
telling when it is inappropriate. 

{5) The quickest method of compulation, both with and 
without the use of calculating machines, possibly also 
indications for nppIiiMticm* with HIM tabulators or other 
machines 

(6) 1 should particularly like 10 see expositions of short-cuts 
for the calculation nf direct probability. 

1 recently learned l»v communication from a mathematical 
statistician 1 * something I was unable to find in anv statistics 
book I could lav my hand on* a method for getting summations 
of binomial expansions The Innominl theorem js an exact 
method that can often be used instead nf the inexact nppioxi- 
mation of Chi-square?, especially with vriy small samples, but 
direct computation mul summing of terms aie very laboiious. 
But, as f learned, one can enter the tables of the incomplete 

TW->^ 

w Dr AtWi 1J, Nowltcr, whom 1 wnh re rhiuik for his viable op 
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beta-function 14 with .1 minimum of effort and rend off these 
summations exact P-values Tor the deviation of any obtained 
figure from a hypothesis Perhaps there are other wrinkles of 
this kind which should be brought to light 

The book 1 am Imping for will accept the fact that most of 
the people who use it will not be very good mathematicians and 
would get little enlightenment from a derivation, In its de¬ 
mands on mathematical and statistical sophistication, and in 
its lucidity of exposition, it will be as much an opposite to 
Fisher's "Statistical Methods for Research Workers" as pos¬ 
sible I sec it as necessanh a product of collabointion between 
mathematical statisticians and other men (including if possible, 
a clinical rescarchct) whose primary jobs are in rescaich but 
who know statistics well enough to communicate easily with 
them. I cannot guarantee the sale of more than one copy, but 
I can guarantee that mine would lie well-thumbed 

"Pearson, Karl <1 *1 I V,j Ma cf the IntmpltH Jttlri/>,union London University 
of London I'rc^% *V T 
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STUDENTS ON SOME WIDELY DISCUSSED 
CURRENT ISSUES 
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Dimihft (he ncidentic year, 1947 194ft, Syracuse University 
earned on an all'university &clf survey which would furnish 
information for iwelhgciii planning during the years ahead 
Included among the various concerns of the survey was an 
investigation of the program of general education of the Uni¬ 
versity, 

As «t part of this study of general education, a sampling of 
seniors, Chw of 194#, and of snphomoKs, Class of 1950, was 
asked to rptjwml to a battery of opinion scales made up of 
statements on 'm\Kb that are the subject of rather wide dis¬ 
cussion at the present time This battery, which is icprocluced 
below in Table i, consisted of ten scales coveting the following 
topics* ixditics, government, civic relations, the world, experts, 
science, philosophy, music, art, nncl literature The students 
were inked to check whcthei they agreed, had no opinion at 
all, or disagreed with each statement 
Perhaps, opinions and attitudes cannot he considered right 
or wrong. However, if one of the objectives of genernl educa¬ 
tion is to develop attitudes and opinions in the students which 
arc consistent with democratic values and which lead to effec¬ 
tive participation m our democratic society, then we will want 
to know how well these opinions have developed in the students, 
Also, if there are opinions about various topics which most 
specialists in that field hold among themselves, then we can 
consider students’ opinions on these topics to be desirable or 
undesirable from the point of view of whether they agree or 




F jj o 86 7 'I he beat government is one which governs least, 

S JJ 7 68 


F 13 7 7° 8 Democracy depends fundamentally on the existence of 

S 62 6 33 free business enterprise, 

F 35 3 Gi 9 Communism and Fascism are basically and historically 

S 39 13 48 snndnr, 

F 13 7 80 10 1 he most serious danger to democracy in this country 

S 46 6 48 comes from Communists and Communist-dominated 

organisations 

F J3 3 83 n Government planning should be strictly limttcd ( for it 

S 23 13 64 almost inevitably results in the loss of essential liberties 

nnd freedom 


F io 3 86 ]a Individual liberty and justice under hw arc not possible 

S 27 15 $ 8 m Socialist countries, 

Faculty N 30 

1 1 3 CIVI C RELATIONS Students N 5 S 5 

F 97 o 3 13 All Americans—Negroes, Jews, the foreign born, and 

S 88 3 9 others—should have equal opportunity in social, eco¬ 

nomic, niul political affairs. 


F 

10 

*3 

68 

14 Familiarity breeds contempt 

S 

17 

n 

7a 


F 

10 

16 

7 i 

15 Foreigners usually have peculiar and annoying habits, 

S 

11 

15 

7 + 

F 

0 

0 

TOO 

16 Children of minority groups or other races should play 

S 

1 

3 

96 

among themselves 


F 60 10 30 17 Most children, these days, need more discipline, 

S 63 12 2J 


F 10 14 76 18 Agitators and troublemakers are more likely to be 

S 15 10 7J forolgn born than native Americans_ 


1 Agree F—Faculty 

2 No opinion at all S—Students 

3 Disagree 


62 p 
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Si&NiS 

'** "■*' h * r) * l( ’ ba ' f l«acc until tliTtH 

a*?!] fV air *HTnfl(trr than all the other Coum^j 

* 11 nr Mfi *w-i Uliff* *« I’crmil more foreign Roods m 
iH * rwunm, wr wdl lowrr ctur slafidafd of living u 

2J IV^j» (tknfnftiCJil dilTrrtHcr^ between countries are ir 

fnrmi£?labk ’ : 


33 If we taIWw mnft immigrant* in la ihn country, wevnll 
b.wrf nui ^Taodafds of culture 

ij 1 1**" (lulled N aOtvn fthuukl have (he nghl lo make con 
irjiftirttn which wi*nk4 land members to a course of ac 

[Dii-I 


3,1 i h rt ib^ antr decedr, we mum try lo make the standard 
r*f Jmn-K tn ilw ml ci f iHe world ri<ic more rapidly thin 
in ttet c*wn twin rrj- 

Siuduvu N Jls 

si XfFSir» Noi r aw by lnuhy 

l be prmdic iwns of ecwiomiiii* about ihe future are no 
belief than 

jt, | ti* d-Jid illncvcs torn out to be wrong 
*4 ofl^h £4 light 


sf Paimis ktsnw as muth about how to teach children iu 
imWir m W«1 letachcr s know 

,H I fmdrnfj* of piyrHoVsgiMa are not helpful in fitting 
mjltrn to lob* 


$rj t raniTitt^farj' pjinicr*i tlostftncrs, playwrights, rmdmusi 
ti*«« ale engagr*! in work as irnporutit as my own 

t u A in a 4>llr4 trade is woiih as nuith to society 

4ta tmr' m a ptofr^mn 

SiudtnU. N SSS 

H*H r St I N fll i*w» by faulty 

11 TW $rv many worthwhile and important concepts 
which ciit nut lie firmed scientifically 


\i I hr hwifte-vimg nf atomic energy will bring about funds 
menu! changes in tmr economic ond social order. 

ti Hie gave rum eiti should promote and subsidise research 
in ihe twal sitcm'cn 


S ?R to ti 

S 6$ f 3 r * 

S «* B <} 

i Agree 

a No opinion at nil 
3 Disagree 


14 Them wdl be to mam or more scientific ^venes, 
mremiom, and iedtm>Sngic*l changci m Aewoiid m 
ihe next fifty years at there were during the past 


ty yem 


VVennw have* enough Reunite and leehnoloflicaljtnw 
lwlgr to isuMil 11v cbin.n atc Ic'now- 

ignoruftM vn the world. If we would wily apply £»* 

Ictltte 


l6 The Bovmuneiit should promote nnd subjrdite research 
in tiic physical and hiol(*gicnt s cicoces _ 

1 <'—Knctiity 
S—Sludenta 
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PHILOSOPHY 


Tnculty N JO 
Students N-555 


O IOO 
1 90 

o ICO 

2 84. 


F 80 o 

S 54 20 

F 100 o 

S 75 7 


o loO 


37 Whnt one does with his life is not very important, except 
to oneself 

38 If the goal is worthwhile, almost mty method is justified 
in attaining it 

39 Personal integrity of conduct nnd continuous searching 
for truth are the most important goals in life for me, 

jo A contract is morally binding, one should never default 
on his pledged word 

41 Religion has little to offer intelligent, scientific people 
todny 


F 

10 

0 

90 

S 

15 

8 

77 


j 

2 

3 

F 

7 

7 

86 

S 

75 

4 

at 

F 

0 

0 

IOO 

S 

10 

15 

75 

F 

0 

0 

loo 

S 

5 

4 

91 

F 

7 

0 

91 

S 

27 

H 

59 

F 

0 

0 

100 

S 

7 

9 

H 

F 

IOO 

0 

0 


S B 3 g 8 


1 2 3 


F 

5 

5 

90 
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100 
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79 
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0 
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13 

7 i 

F 

95 

5 

0 

S 

80 

15 

5 

F 

5 ° 

JO 

AO 

S 

17 

jfi 

67 

F 

75 

15 

10 

S 

3 ° 

40 

30 


1 Agree 

2 No opinion nt nil 

3 Disagree 


42 J lie grentest satisfactions in life for me come from finnn- 
cml success, influence, and prestige 

Faculty N H 
MUSIC Students N 555 

4.3 Whnt is good nnd bud in music is n matter of personal 
taste 

44 The lendency of some modern composers to use strnnge 
harmonics nnd discords makes poor music 

45 Music is a form of expression which normal people nre 
not capable of understanding, 

46 1 lie mam thing about good music is its lovely melodies 


47 llicrc hits been little or no outstanding music composed 
111 the 20th century 

f8 Radio should give people much more opportunity to 
hear good serious music 

Faculty N- 20 
ART Students N 555 

49 What is good and bad in art is a matter of personal 
taste 

50 Art is a side lino, not part of the mnin business of life 

$1 Modern painting—impressionism, expressionism, cubism, 
surrealism and the rest^is mostly the work of crack¬ 
pots 

Many paintings which were considered radical it the 
first nre regarded as classics todny 

$3 New houses should be built of modern design rather 
tlinn Colonial, Cape Cod, Spanish or some oldet style 

jf. Flee trie lighting fixtures should not look like candles 

F—Faculty 
S—Students 
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1 Aertf 

i No opinio m all 
1 ITiMgftKi 


fS 


UTFRA rURl 


Foully N 20 
iludtnuN JSi 


JSnrm whet us« a different and *7l>iic^>^^r7, 
trcrfmtlr Sinn or Jiwn^ Jt))Cc should not be mbn 
iwnomfy 11 


56 W1 by firofniwng ytnm writer* 13 more likely 

(o be found m JlarfwrK or the Ailanuc Monthly than m 
Colltefo or Thz Saturday l\tnirg Fo*t 


Jiy l-tleraiun* -dtotild not i|nesiion the basic moral concent 
of ftoociy v 


Jfollrmwjl vemons of novcU or plnyj are usually M 
good a? the original* 

59 The Cwrrrnt hat of tnrat ndlm in fiction and non fiction 
dturj mil provide n very good mdc* of literary ment, 

to Juteraturc should Ik judged pnmnnly by 113 contribution 
kj our undcratandmg of the social order 

b d^ieuliy 
S Students 


not with the individual* who have devoted n large amount of 
study to the particular field, 

Different parts of the opinion scale were completed by faculty 
members teaching in the area with which each part was con¬ 
cerned. "Politics,” "Government,” "Civic Relations," and 
"The World" were submitted to all personnel of the Maxwell 
School of Citizenship of professoiial rank and to a sampling of 
instructors. The part entitled "Philosophy" was answered by 
members of the Departments of Philosophy and Bible and Re¬ 
ligion and the chapel staff "Music" was presented to all staff 
members in the School of Music and "Art" to the staff of the 
School of Art. The "Literature" section was given to selected 
members of the English Department who were most concerned 
with the teaching of literature. The parts labelled "Experts" 
and "Science" were completed by no particular group, Eighty 
per cent or more of the staff members who filled out these 
scales had been at Syracuse for at least two academic years, 

The response for each item which the majority of the experts 
marked was taken ns the correct teaponse for that item The 
students' papers then were scored, using these faculty responses 
as the key, Three items were not used because of insufficient 
agreement among the experts 
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The students who paitiup.ued m this pait of the survey 
were selected fiom five colleges of the University Applied Sci¬ 
ence, Business Administration, Fine Arts, Home Economics, 
and Liberal Arts Mean scenes for each pait of the scale and 
for the entire sLale for both classes in these five colleges were 
computed Homogeneity of variances was compared by the 
"F" test and the significance of the diflerences between the 
means by the “t” test 

When the students' lesponses were compared with those of 
the Faculty, a lather laige difference of opinion appeared for 
some items (Table i) 

On item 6, concerned with government lepiesentatives voting 
accoiding to then convictions, even though they are not re¬ 
flecting the opinions of then constituents, 73 per cent of the 
Faculty agreed with the item and only 37 per cent of the stu¬ 
dents, Item 8, which stated that democracy depends fundamen¬ 
tally on the existence of free business enterpnse, was marked 
‘'disagree” by 70 per cent of the staff members and 32 per cent 
of the students. “The most serious danger to democracy m 
this country comes from Communists and Communist-domi¬ 
nated organi'/ations,” item 10, was maiked "disagree” by 80 
per cent of the Faculty and by 48 per cent of the students. 
Eighty-six per cent of the staff and 58 per cent of the students 
disagreed with the statement that individual liberty andjustice 
undei law me not possible in socialist countries (item 12) 
Item 24, which stated that in the next decade we must try to 
make the standard of living in the rest of the world rise more 
rapidly than in our own country, was ngieed to by 87 per cent 
of the faculty and 54 per cent of the students 

In the area of philosophy, three of the items showed differ¬ 
ences between students and Staff Item 39, which stated that 
personal integrity of conduct and continuous searching for 
truth are the most important goals of life for me, was agreed to 
by 80 per cent of the staff and £4 per cent of the students All 
of the Faculty believed that a contract 19 morally binding and 
that one should nevei default on his pledged word, item 40, as 
compared with 75 per cent of the students. Eleven per cent of 
the students believed that religion has little to offer intelligent, 
scientific people today and 7 per cent °f th e students had no 
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opinion at all oil tins siatcmcnl (item 41) The faculty rejected 

viewpoint completely 

In the area of the Fine Arts, Item 43, which said "What is 
good and had m music is ,1 matter of personal taste/’ was 
marked “disagree" by 86 per cent of the staff and u per cent 
rtf the students A similar item on ait, item 49, was similarly 
marked by 90 per cent of the staff and 25 per cent of the stu¬ 
dents In the area of Literature, item 56, which stated that good 
writing by promising young writers is more likely to be found 
in Ihtrprrj or 7 'he rfthnltt Monthly than tn Colliers or The 
Saturday Evening /W, was marked ,r agree" by 95 per cent of 
the staff mem hers and jf» per cent, of the students, 

The responses of the Senior Class were then compared with 
those of the Sophonioic Class Chi square was used as .1 test of 
significance and the responses of the two classes were found to 
he significantly different for twelve items. These items were 
I, 18, J4, 41,41, 44, 48, 49, <4, 55 and 56. Of these twelve 

items, nine of them showed the opinions of the saphomoies to 
Iks more m agreement with the opinions of the Faculty The 
exception* to this were items 34, 48 and 54, 

When scores on the scale were compared, no difference was 
found between the mean total score of the scmois and that of 
the sophomores The Liberal Arts sophomores and the Fine 
Arcs seniors were significantly above the all-university mean 
of their respective classes and both classes of die College of 
Applied Science and die Serum Class of the College of Business 
Administration were significantly below their respective all- 
university means 

On tbc pint labelled ‘'Politics' 1 the Liberal Arts sophomores 
were significantly above the mean of all sophomores and the 
Fine Arts sophomores significantly below it Both classes in 
all colleges were well below the mean foi die Faculty, In 
“Government" both classes of the College of Liberal Arts were 
significantly above their respective class means, die sophomores 
in the College of Home Economics and both classes of the 
College of Fine Arts significantly below it, and all groups 
below the faculty mean. In “Civic Relations” the Liberal Arts 
seniors were significantly above and the Business Administra¬ 
tion seniors were significantly below the all-university mean 
for this class Tn this section the mean for the faculty and the 
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means for the two classes were approximately the same In 
their opinions concerning ,f The World, 0 the Libeled Arts soph¬ 
omores were significantly above their class mean and the Home 
Economics sophomores weie significantly below it In this area 
the seniors more closely appioximated the faculty mean than 
did the sophomoies In the area on "Experts” no significant 
differences were found between any of the college and univeisity 
means On the "Science” pait of the opmionnire, the Liberal 
Arts seniors were significantly above the all-university mean 
for seniors 

In the part on "Philosophy” no differences were found and 
both classes were lower than the faculty mean, In "Music,” 
the Fine Arts seniors and Home Economics sophomores were 
significantly above their class means and both classes of the 
College of Applied Science and the Business Administration 
semois significantly below it In "Art" both classes of the 
Colleges of Fine Arts and Home Economics were significantly 
above their respective means and both Applied Science classes 
and the Business Administration seniors significantly below 
them On "Litei.ituie” the Fine Aits seniors were significantly 
above the nll-umversity mean foi this class and the Applied 
Science seniors significantly below it On these last three sec¬ 
tions of the opinion scale, the mean faculty opinion score was 
much higher than the student mean score 

When the two classes in the same college were compared, 
significant differences were found in a few cases—the Liberal 
Arts semois were significantly higher in "Science” than the 
sophomores in the same college, the Business Administration 
sophomoies higher than tile seniors in "Civic Relations,” and 
Fine Arcs seniors higher in "Civic Relations,” "The World,” 
"Art,” and "Literature,” and in the College of Home Econ¬ 
omics the seniors were significantly higher than the Sophomores 
in "The World ” " Music,” and "Art ” 

Summary 

i There is considerable difference of opinion among the 
students of Syracuse University on widely discussed current 
issues On most of the items in these opinion scales, the majority 
of the students agreed with the Faculty, 
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j On s number nl the ircitis, lm«,ever, the replies of the stu¬ 
dents Jirtrr M>n«uderaMv from tlmsc of the experts, 

$ Srmor% wiirralh reflect opinions which are no closer to 
ijioyp «ri the c\jn:rH than are the opinions of the sophomores 
4 Jo Kcisrrjl, 'I'ldrnH ‘•pcunli/WK in an area reflect better 
opinions in rlinr own area I he Mr (her one gets away from lu s 
own area the wider do his opinion** differ from those of experts. 
Yet the statement** nl nprtmns imltulcd m the scales were 
deigned lo reflet i Imhh iiismhts and generalizations about var¬ 
ious topics rather than pirnt** of view which would be peculiar 
to the specialist 'Hie need for broader general education is 
therefore mi gutted 
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Selection is commonly carried out in professional schools 
as though it were a process independent of the educational 
program Admissions tests are given, candidates ranked, and 
selections made with little or no reference to the conduct of 
other aspects of the school’s work In this process, tests lend a 
certain appearance of respectability as well as of objectivity, 
although their effectiveness may not have been empirically de- 
termmed by the school. Possibly much of this difficulty stems 
from the fact that the tests are ordinarily developed by outside 
specialists and so are not accepted by the staff as integral parts 
of the total educational program. Whatever the causes of this 
difficulty, the fact remains that the planning and conduct of 
admissions procedures are not commonly integrated with the 
planning and conduct of other aspects of the school's program 

It is the purpose of the article to re-state the problem of selec¬ 
tion in terms of a much broader framework than conventionally 
conceived. Most of the discussion will focus on three rather 
basic problems determining criteria of proficiency, determining 
the predictive measures, and limitations to prediction It will 
be assumed that prediction 1 is the essential element in the 
process of selection 

Determmwg the Criteria of Proficiency 

Every selection program involves judgments as to the future 
perfoimance of the candidates Since future performance serves 
as a criterion against which selection procedures may be val¬ 
idated, it becomes necessary to clarify the nature of this per- 

1 Although the term "prediction'' is used in this article, the writer recognizes that 
the most one enn do is to mnkc a statement of an Individual s relative probabilities of 
success in specific activities rather than "predict" what he will actually do. Cf Super 
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jnrifMiuc I'wn liro.nl kinds of performance may serve altet- 
natncly .1*4 the irumon we wish to predict—success m the 
profr^mn.il school and success in professional work In the 
immunity of '1 homelike fj), the latter is the ultimate cri¬ 
terion , it is the uhimaic trnenon because it represents the 
(uwl god of professional education In this sense the ultimate 
annum is the test lint only of the effectiveness of selection 
procedures luit also n| the program of professional education. 

Slums iff Vttijttmm! tt'ark As stated, "success m pro 
JcMioniri work" is obviously too nebulous to be of use in guiding 
the validation ul Ackclinn procedures and the curriculum of 
the profc^ianal school The concept calk for considerable analy¬ 
sis* especially with reference to the definition o( professional 
work and mtcn.t of success in it 
It would Ire naive, rtf uiurtc, to talk of the position of doctor, 
lawyer, or teacher as though there were erne uniform type, 
Professional positions vary from place to place and from time 
to time Uumately, there arc ns many positions as there are 
individuals enyager! m a profession, lor to a certain extent each 
^man is determined by what the particular practitioner puts 
into it This gene tab/a non is true of any occupation 
The technique of job dmripuon may he utilized as a means 
of classifying the major kinds of positions in a profession. Job 
descriptions of a wide sampling of positions indicate the ac¬ 
tivities performed by tlie practitioner, their relative importance, 
nnd the conditions under which they are performed. But the 
question of what strnU k (lie majoi types of positions in the 
profession still remains, for the descriptions do no more than 
describe and clarify activities as they exist One recent study, 
for example, found that about a quuitcr of a hospital nurse’s 
time is spent 111 hatlung and feeding patients and in clerical 
and routine duties, many of which might be performed as 
readily by rum-nuncs (1) This finding raises the question of 
whetJier or not nurses should be relieved of such duties so that 
they may devote themselves more completely to woik of a 
professional nature Doubtless this issue is very much alive 
today The point m tlur the collection and classification of jo 
descriptions simply piovide us with normative data; they de¬ 
scribe. activities ns they aie, jathei than as they ought to be. 
Illustrations enn also be drawn from other professions In teac 



selection OF siudenis 


639 

mg, some individuals must devote so much time to the keep¬ 
ing of pup J l records that they come to think of personnel work 
as synonymous with record-keeping And in law, one wonders 
what the effect of procedures analysis and simplification studies, 
on the one hand, and the application of the personal frame of 
reference of modern psychology (a), on the other, would be 
upon the entire legal profession and the law schools The major 
kinds of activities that make up a profession at any time are 
normative, at best, 

Assuming stability of tasks and responsibilities which go into 
the major types of positions in a profession, Job analysis, as 
distinct from job description, may be utilized to identify the 
characteristics needed by the woiker for successful performance 
It asks this question “What kinds of knowledge, abilities, and 
traits does a practitioner need m order to be successful in a 
given type of position?" Answeis to this question are of value 
to the professional school in setting up its curriculum The 
kinds of knowledge, abilities, and traits required for particular 
kinds of professional activities suggest objectives for the cur¬ 
riculum, The objectives for the curriculum—that is, the kinds 
of learning to be fostered in students^-give clues as to the 
particular aptitudes and traits required of the professional 
students, 

Implicit in the process of job analysis is the idea that criteria 
of proficiency exist or can easily be established Probably the 
day-to-day proficiency of the practitioner could be readily 
ascertained, if necessary Ordinarily, however, indexes of pro¬ 
ficiency covering a relatively long period of service are sought. 
Several such indexes or criteria have been suggested for any 
profession length of service, salary or income derived from 
work, and rank oi professional standing The limitations of any 
one of these are easily infeired 

Criteria of proficiency are basic to any definition of “success 
m professional woik," but they do not tell the whole story 
Success goes beyond mere proficiency to take in the prac¬ 
titioner’s satisfaction with his occupational pursuits, and so is 
subjective from that standpoint Also, the public has a stake 
111 the matter, Results, services rendered, are the chief touch¬ 
stones of the public in appraising the success of a practitioner 

The discussion up to this point, m short, has emphasized 
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th.u *m.< c*s f i in professional work is .111 elusive and relative m- 
tcpt Alrhmigh (he difficulties are great in defining the major 
types nf jwmmis in n Riven profession and in determining cri¬ 
teria of Huvtciw, the basic fact remains that the ultimate test 
of julnii^inng procedures ami professional education hes in 
demonstrated succcs-s on the job 
Stums tn the Professional Schools- -This is a more immediate 
and direct criterion against which to validate selection pro¬ 
cedure Again, the meaning of success needs to he clarified— 
what should success include? Certainly it ought to include 
graduation from the school This is, after all, the official act of 
the school indicating that the student is qualified to engage in 
the profession, But graduation also implies that the success¬ 
ful student was able to persist in the school, to complete the 
program of professional education despite the possible influence 
of certain factors which tend ro cause withdrawal or elimina¬ 
tion This point needs emphasis because many unpredictable 
factor* only remotely related to academic achievement can 
cause a student to drop out or he eliminated. 

More s|H;afically, of course, graduation implies that the 
student has achieved certain objectives of instruction He has 
acquired a body of knowledge, developed certain skills and 
abilities, and cultivated certain traits which the faculty has 
judged necessary for him to possess ns a prospective prac¬ 
titioner. The degree of his development in each of these major 
directions constitutes the several criteria winch denote the 
degree of )us success in the professional school. These are the 
kinds of achievement which selection procedures necessarily 
must attempt to predict, if there is to he any prediction worthy 

of the label "scientific. 1 ' , 

These desired outcomes are crucial as guides to curriculum 
development, the selection of instructional materials and meth¬ 
ods, the development of examinations and other evaluation 
instruments, and the development and validation of selection 
procedures. Hence it k desirable that they be carefully de¬ 
termined by Che school. This is a curriculum problem, an we 
established procedures for attacking it systematically have een 
developed by curriculum specialists. Such a systematic pro 
cedure, more elegantly termed a rational?, hns been compre- 
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hensively outlined by Tyler (6) This procedure can be clarified 
by showing how it could be applied in a field—nursing educa¬ 
tion, for example, 

A iationale foi determining instructional objectives includes 
two basic steps, one concerned with the getting of suggestions 
as to possible goals and the other with the actual sifting of those 
suggestions In the field of musing education, suggestions as to 
possible curricular objectives may be obtained from a number 
of sources reports of curriculum committees such as that of the 
National League of Nuising Education, repoits of other groups 
representing more specialized subject areas, reports of job anal¬ 
ysis, follow-up studies of graduate nurses, and studies of stu¬ 
dent nurses themselves Reports of various curriculum com¬ 
mittees indicate, of course, what these experts consider to be the 
most worthwhile outcomes of instruction for their particular 
areas These repoits are valuable because they represent an 
accumulation of experience as to what is worth teaching Re¬ 
ports of job analyses are valuable for keeping the curriculum 
in step with the actual needs of the nurse on the job, and for 
keeping the cumculum m step with advances in medicine and 
nuising. Reports of job analyses may also suggest the desirabil¬ 
ity of setting up fields of specialization within the nursing cur¬ 
riculum to parallel the different kinds of nursing fields Fol¬ 
low-up studies of graduate nurses can be helpful to the school 
of nursing in identifying points of strength and points of weak¬ 
ness m professional preparation Points of weakness are par¬ 
ticularly important because they suggest needs that should 
have been anticipated and met For example, possible needs 
might include increased skill in applying principles of mental 
hygiene in dealing with patients, or knowledge of new tech¬ 
niques arising out of atomic research. In the same way, studies 
of student nurses may suggest points of strength and points of 
weakness peculiar to the particular student population, so that 
the curriculum ought to be adjusted to devote moie time to the 
weaknesses and less to the points at which the students are 
already quite competent. 

Since the consideration of each of these sources will ordinarily 
result in a list of instiuctional objectives longer than the in¬ 
dividual school of nursing can provide for, it is necessary to 
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^rrrii llicw XMrrmis siu’pc'srinnq to Aelcct .1 smaller number thut 
van iw attained In Mudentst during tltc period of their nr 
rr-nirtn.il cdui anew Here. again, the process should bq n rational 
wir. deliberate cflorr should be made to make choices that 
rcflnf >1 careful vnnsidcraNon of relevant criteria A first cri¬ 
terion might Ire thr philosophy nr concept of nursing held by 
the particular tchnnl If tile school plates gieat value upon 
health ten* bins* as ^ell as rare of the sick, upon mental hy¬ 
giene as well as physical treatment, then it will try to provide 
instruction in these areas On the other hand, if a school ad¬ 
heres lu ,1 more conservative definition of nursing it may not 
feel compelled to provide such instruction Similarly, the re¬ 
sources rif the school act as a screen in determining what kinds 
of instruction it should and should not attempt to provide 
Resources here would include the stuff, libiary, laboratories, 
equipment, and die like, Another important factor influencing 
die uimtulurn in the kind of nunc demanded by the various 
hospitals which draw ujwm the school of nursing, It is conceiv¬ 
able diat in a certain geogi jpliicid area there may be much more 
demand for certain kinds of nurMts than lor other kinds This 
differential demand thus serves as an important factor m de¬ 
termining the kinds of preparation which a school of nursing 
emphasizes 

The result of dim procedure of screening possible objectives 
will not lie uniform Ilic particular objectives winch one school 
of nursing attempts to attain will differ somewhat fiom those 
of another. *J his variation is nor neecvmnly undesirable, it will 
reflect, in part, the adaptation of die curriculum to local con¬ 
ditions Actually, in view of the similancy of nursing activities 
from plate to place, one would expect the various curricula to 
emphasize many common objectives 

In the process of formulating objectives, it is important that 
they be listed for each course rn comparable unit of instruction. 
It is not enough to nay that we are going 10 teach human phys¬ 
iology, or mental hygiene, nr surgical nursing A statement of 
dm kind is too vague, for it does not indicate iri sufficient detail 
the particular understandings, skills, and abilities to be de¬ 
veloped. It is necessary to go on and actually list the important 
facts, concepts, and geneialrzations, and the important skills 
and abilities that are desired as outcomes of a given course 
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Such a listing serves to facilitate the next step, that of de¬ 
fining these outcomes so as to make them measurable If we 
are to predict the ability of students to achieve the outcomes of 
instruction, then we need to have a fairly clear idea of what they 
are Take the desired outcome, understanding of and ability 
to apply principles of mental hygiene This objective, as it 
stands, is general and needs to be clarified by listing the spe¬ 
cific components which it includes. Some of the important spe¬ 
cific behaviors which make up this more general outcome might 
be the following 

n Understanding of the basic principles of mental hygiene 
b An objective or scientific point of view toward human be¬ 
havior—1 e, .1 recognition that all behavior is caused 
c Ability to distinguish between symptoms and causes of 
behnvioi disci ders 

d Ability to make a satisfactory tentative diagnosis of common 
behavioi disorders 

e Ability to adapt musing and therapeutic techniques to 
patients with differing needs and conditions 
f Skill m helping patients to develop insight into then own 
problems and to woik out their own solutions 

When objectives have been analyzed in this manner, they 
become fairly precise specifications for the development of 
achievement tests and otliei means of appraisal, as well as for 
the detailed planning of the curriculum. Success in the pro¬ 
fessional school may then be considered as the extent to which 
students have actually attained these defined objectives at 
vauous stages of their professional education These measures 
of status or achievement constitute the best criterion measures 
for validating selection procedures They are the kinds of per¬ 
formance we are trying to predict 

Ideally, we would like to have some assessment of the stu¬ 
dent’s status and achievement with respect to all of the im¬ 
portant objectives at the tune of completing the professional 
piogrnm Since this is likely to be impractical under present 
conditions, it may be necessaiy to use as criterion measures 
achievement at particular sLages of the total program For 
example, the student’s pei formance at the end of the First 
Semestci may be used as an index of perfoimance in later 
semesters Fir&t-semestei performance is actually a rather good 
piedictor oflater performance, particularly when taken in com- 



^ fMC\NO>U[ AV» rsViHOLOMrAL MEASUREMENT 

biiMHon with a measure nf prc-profcxsioiial scholarship per . 
fitruuiHe aii-cl fine of sehaljsue aptitude A limitation of first 
wither nr first )car performance as n criterion of success for 
the entire program is that the cy|w of school work often tends 
in shift from theoretical to practical courses This shift in 
work placed somewhat different demands upon the student 
However, in determining the usefulness of selection tests, first' 
wmetter performance can serve fairly well as a criterion of 
surcews for all cm mills in the school, including enrly with¬ 
drawal^ and aa a predictor of success for those students who 
continue beyond the Kirst Semester, Criterion measures, of 
course, may be limited to measures of achievement m particular 
eounars. 

In addition to th«e partial critciia of performance, two other 
criteria should be mentioned, namely, graduation and over-all 
grade point average The interims of graduation vs non-grad¬ 
uation is useful in determining the characteristics which dis¬ 
tinguish between students who graduate and students who 
withdraw or otherwise fail to graduate. The over-all grade-point 
average mtty he useful as a single index of success because of its 
stability and convenience, but its exclusive use in many pre¬ 
diction studied serves only to perpetuate the hackneyed prac¬ 
tice of determining the empirical relationship between grades 
and scores on aptitude test# In general, these two criteria 
suffer from the limitation of not being analytic; they do not 
analyse achievement into its component parts, Broad criteria 
of this kind are supplementary measures, but not substitutes for 
separate measures of the degree to which each important goal 
of instruction has been achieved by students 

Determining the Predictive Measures 

The second important problem is that of determining what 
measures to use in the process of selection The test of any such 
measure, of course, is ns usefulness in identifying good risks for 
professional education. A selection or predictive measure is 
useful to the extent that it indicates a candidate's relative 
probabilities of success in completing various aspects of the 
program. This statement implies that the pioblem is basically 
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statistical, for its data can be expressed m terms of relative 
probabilities of success. 

Relative probabilities of success are determined empirically 
by 1 elating two sets of data (i) criteria of proficiency which 
become available during the course of professional education or 
later professional work, and (2) appraisals of the candidate's 
behavior and characteristics prior to, or early in, professional 
education Relationships established empirically on one group 
are then ordinarily applied to subsequent groups, provided the 
attendant conditions have not appreciably changed 

Correlation is the usual method for expressing relationships 
between these two sets of data Essentially the task is to get 
predictive measures such that variance on them will be highly 
associated with variance on the criterion measures When such 
correlation is obtained, it is assumed that the two kinds of 
performance, predictor and criterion, have much m common— 
the factors that bring about high performance on the predictors 
are substantially the same factors that bring about success in 
various aspects of professional education, 

A crucial decision, therefore, is that involving the choice of 
selection tests Unfortunately, this choice has all too often 
been made without a careful study of the functions measured 
by the tests and their relationship to the functions measured by 
the criteria. Tests have frequently been tried out in a hit-or- 
miss manner in the hope that they would correlate highly with 
the various criteria, While all selection tests should be validated 
empirically through follow-up studies of their effectiveness, 
it is sound piactice to try to obtain validity at the outset 
through rational analysis of the criteria A careful analysis of 
the abilities and traits required for successful performance on 
the criteria should result in a series of hypotheses as to what the 
selection tests ought to measure, For example, the curriculum 
of a medical school will typically include a great deal of course 
work in the natural sciences, Consequently, one might hy¬ 
pothesize that the ability to read and analyze fairly difficult 
natural science materials would be an important factor con¬ 
tributing to academic success, In the same way, one could 
analyze other aspects of the program and attempt to form 
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rational hypothec as io flic abilities and iraita required for 
successful jierformancc These hypotheses constitute, in effect 
specific aurmn for our selection tests They provide a rational 
basis for choosing front among existing tests and for develop 
mg new method* of selection 

Another approach to the determination of predictive meas¬ 
ure involves a uimparismi nl students who graduated with 
ihme who eventually withdrew or failed to graduate The ba&ic 
idea i*t that file graduates pemwed certain presumably desir¬ 
able iharattenants at the time of entrance which were not 
pmawrd, or were poMmed m a limited degree, by the non- 
graduates t fence the problem is to identify such characteristics 
which will differentiate statistically between the two groups 
of students 

The use of thi^ method does not mean that any and all char- 
acicrisMU found to differentiate should be accepted unequivo¬ 
cally A characteristic m identified should have some real rela¬ 
tionship to success in the professional school and in later 
professional activity, and should be suutmizcd from the stand 
point of broader vk nd tmuidcrations, hot example, the possibil¬ 
ity that « higher proportion nf giaduairtof medical schools than 
of non grnduaic’t had a relative who was a doctor would not 
necessarily mean that, other factors being constant, preference 
be given the candidate who was related to a doctor The sta* 
nstiu themselves may reflect traditional admissions practices. 
True, relationship to a doctor may frequently be an indirect 
mdex of necessary aptitudes and interest, hut an important 
social qurstion remains This is the question of whether ad- 
nutumma procedures and standards should perpetuate the status 
quo in the profession by selecting candidates who are quite 
similar to each other in all of their characteristics and to the 
active practitioners T he writer will not attempt to answer this 
question. 

Hypotheses as to possible differentiating characteristics may 
be drawn from such sources as a review of picvjous investiga¬ 
tions, analysis of admissions arid other biographical data, and 
observations of students m counseling nnd classroom situations, 
A sample list of hypotheses that might be applicable to the 
different fields of professional education includes the following 
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a The giaduate, in comparison Lo the non-giaduate, is more 
likely to have based his vocational choice upon realistic con¬ 
siderations than upon steicotyped conceptions of the profession 
or a too lofty idealism I hat is to say, he will look upon the 
field primarily as a worthy occupation for which he is well suited 
lather than solely as a quick way to get rich, or a glowing op¬ 
portunity to serve a sick and downtrodden humanity, or some 
similar considei ation The basic assumption heie is that a realis¬ 
tic attitude will enable the student to persist in his education 
whereas othei considei ations may be inadequate to overcome 
basic deficiencies m such factors as health, aptitudes, and a 
genuine intei est in the field 

b The graduate, in companson to the non-graduate, is more 
likely to have had bettei study habits in professional edu¬ 
cation The assumption heie is that efficient study habits formed 
in the earlier years of education tend to persist during tie later 
years and to influence maikedly the student's academic success 

c, The graduate, in comparison to the non-graduate, is more 
likely to have had good health at the time of entrance and dur¬ 
ing professional education The assumption here is that good 
health enables one to woik energetically and to withstand the 
physical stiains that accompany work in the piofessional school 
and the occupation itself 

d, The graduate, in companson to the non-giaduate, is more 
likely to have been well adjusted socially and emotionally 
His peisonahty was moie likely to be stable and free from emo¬ 
tional conflicts, It is assumed heie that the well-adjusted person 
is better able to maintain an even tempo in his studies and with¬ 
stand the various environmental and emotional forces that 
disrupt effective learning 

Other hypotheses could be offeiecl in addition to the fore¬ 
going Such characteristics as socio-economic status of family, 
size of high-school giaduatmg class, extent of participation in 
extiacurnculni nctivites in high school, and marital plans might 
reveal important diflci elites which could function as indirect 
indexes of more directly desnable qualities They should prob¬ 
ably be suutmizcd fioin the standpoint of social desirability 
as well. 

It is to be noted that the assessment of these personal and 
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background thar-u tenant frci^ucntly calls for methods of at) 
prvm.i) other than formal paper and-pcncil teats The assess- 
men! of health, of course, i* best accomplished by a physical 
cxjntinaiKm Various items ol background information such as 
rea*nn« lor vocational choice, father's occupation, and the like 
may be obtained from a i arefully developed biographical ques¬ 
tionnaire Evaluation*! of h truly habits, motivation, and per- 
winabsaxial adjustment may he obtained m the form of ap. 
prmuN by former instructor*, but the alternative methods of 
obtaining ilnrw evaluations are quite numerous. 

Tim approach, in short, provide* additional standards useful 
in the selection prute** Many of them arc of a non-intellectual 
son and therefore complement the more formal measures of 
intellectual abilities and knowledge It is in this direction that 
much future research will profitably move. 

IjmtMinrts to Pt eviction 

Even though a professional school has established various 
criteria of proficiency and ha* utdi 7 cd valid predictive measures, 
it \*ill null find that its procedures leave much to be desired in 
the way of accurate selection When the correlation technique 
is u*ed, the term “ceding" is applied “Ceiling' 1 refers to the 
fact that most predictive measures seldom account for more 
than jo per cent nf ihc variance in the criterion scores. This 
generaliMUon holds true for various combinations of aptitude 
mensures ns well as for single measures, The addition of one or 
more other measures to one which already accounts for 40 to 50 
per cent of the variance in achievement generally does not 
raise the correlation appreciably. A ceiling or limit to prediction 
is reached no matter how many aptitude measures are used or 
how valid they otherwise appear to be 
Two important implications follow from this fact, The first 
is that we are somewhat limited in the accuracy with which we 
can predict academic achievement While, m general, the stu¬ 
dents who make the higher scores cm the aptitude tests will 
also make the higher scores on the achievement tests, some of 
these able students will achieve at only a mediocre level or will 
eventually even drop out of school entirely and, correspond¬ 
ingly, some of the less capable students will achieve at a level 
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much above expectations and will complete the program suc¬ 
cessfully Obviously, such depm tures from the expected will 
limit the usefulness of our selection procedures in identifying 
good 1 isks 

The second implication is that the remaining 40 to 50 per 
cent of the variance in achievement not accounted for by meas- 
uied aptitudes must be attributed to other factors. These other 
factors are usually considered to be of a non-intellective sort, 
particularly motivational, but up to the present they have not 
been accurately defined and measured 

Even if these other important systematic factors were iden¬ 
tified and measured, it is quite likely that the accuracy of pre¬ 
diction would be limited A scrutiny of the learning process is 
enough to indicate this, for many factors other than aptitudes 
influence the clegiee of learning day in and day out. interest in 
particular subjects, degree of rapport with particular instruc¬ 
tors, extent of participation in extra-curricular and out-of-school 
activities, health, home conditions, financial resources, and 
manta! status, to mention the most important. Furthermore, 
like many events in life, these factors are to a certain extent 
unpredictable And although they may operate quite inde¬ 
pendently of aptitudes, collectively they are almost as im¬ 
portant for they influence the effectiveness with which the 
student uses his capabilities 

Limitations may inhere in the various criterion measures 
This is not surprising in view of the complexity of learning. 
By and large, the cnterion mensures now in use—the average 
grade in a course or program—* have a rather limited validity 
A grade in a course or program is, after all, a kind of summary 
evaluation which indicates the over-all success of the student 
Such evaluations have some usefulness in prediction studies 
hut, in general, suffer from the limitation of not being analytic, 
since they do not indicate the extent to which each one of a 
compiehensive array of desired outcomes has been achieved by 
individual students. The weakness of an average is that it is 
always somewhat artificial; it implies uniformity where vari¬ 
ability is the rule. Instead of describing the pattern of achieve¬ 
ment over the various instructional objectives, it yields only a 
conglomerate the parts of which are rather non-descript 
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Hud™ An d gr.ulr pmm .t\er.uj«s commonly suffer from 
another vriumbmil Minn rhest meaning varies, they may rep- 
rrwnt different tiling in the same enurse A given grade may 
numf) one oi 'tccrMl fhm$r* It may be used to indicate the 
anmuiH n| pmgrm that a student lias made in a course, ir- 
rrfpeehvr* rd ibr f n tu.il level of achievement Dr, it may indicate 
the amount of effort that the student lust put forth, regardless 
id |irf*jrtT*h or final status More commonly, of course, a grade 
i* u*rd a-* a infusur* of the final status or level of achievement 
tli.il the Modern ha* reached, regardless of the amount of ef¬ 
fort expended nr the relative degicc ol progress made Now 
wliai frequrntlj happen* is tli.it a given instructor may be 
aligning grades ro different students on different bases, rather 
than on a uniJnrm basis Smirl.irly, different instructors teaching 
the sirne course may usi quite different bases and standards 
in aligning grades Variations ol this hind tend to lower the 
v.ilidic) rd the grades as criterion measures, and consequently 
reduce the mi eviction between the predictive and the criterion 
measures l T nrdrthdit\ ol the mlcrion measures, of course, 
,i|wi lowers the validity coefficient 

HirH’ are not ihe onlv probirim in trmnection with the 
validity and reliability ol criterion measures, hut they are 
among the most important Problems similar to these also arise 
in attempting to obtain satisfactory predictive measures They 
nerd not be elalwirited here 

One other source ol difficult) should be discussed that source 
is the relative homogeneity or restricted range of talent of the 
sindent population Validity coefficients increase when a test 
is u&cd on a group with a wide range of aptitude, and decrease 
when the tor is used on a relatively homogeneous, preselected 
group Since many students of i datively low aptitude are re¬ 
fused admission to professional schools, the group finally ad¬ 
mitted is always more homogeneous in aptitude than the com¬ 
plete group of applicants Had all of the students who applied 
been admitted and allowed to continue in school ns long ns they 
could, we would obtain higher toeffu ierUf* of corielation between 
aptitude tests and school achievement. Practically, of course, 
there is no jioint in admitting poor i isks if we have good grounds 
for considering them so 
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This rather well-known phenomenon of shrinkage raises an 
important consideration Thus, the effectiveness of selection 
may be judged in terms of the magnitude of the validity co¬ 
efficients, rather than in terms of the proportion of candidates 
who have successfully completed the piogi am of the professional 
school If we do look at the propoi tion of successful candidates, 
rathei than at selection piocedures only, then certainly we need 
to consider how effective the total program was in bunging the 
candidates up to acceptable standards of development (A 
great deal happens after ,1 student is admitted to a professional 
school) The validity of objectives, the effectiveness of instruc¬ 
tional methods and materials, the effectiveness of student per¬ 
sonnel services, and the validity of methods of evaluating stu¬ 
dent progress all need to he reconsidered This all goes back to 
the fundamental thesis of this article, namely, that selection 
procedures should be made an integral part of the total educa¬ 
tional program 

Improving Selection Piocedures 

The thesis of this ai tide is that the planning of admissions 
or selection proccduies should be integiated as closely as possi¬ 
ble with the planning of otlici aspects of the educational pro¬ 
gram. Those aspects of the program of professional education 
which are closely 1 elated to the planning of admissions pro¬ 
cedures are the following 

The selection and definition of msia actional objectives 
The selection and organization of learning experiences to attain 
these objectives 

The development of measures of the extent to which these ob¬ 
jectives have been achieved by students 
The development of measuics to predict achievement of 
the objectives 

The cariying on of continual 1 (.search 

Opeiationully defined objectives me the threads which tie 
together these vanous aspects of the total program Objectives 
are ctucial because they function as criteria by which the cui- 
riculum is planned, carried out, and evaluated Stated othei- 
wisc, objectives become criteria by which subject matter is 
outlined, mstiuctionnl materials selected, mstinotional methods 
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developed. and criminations ami other instruments of evalua- 
lmn prepared A i lunge m the nature rjf objectives will or* 
ilmanh nr<rv.i(,uc inrrrs|«intling changes in the various other 
,upcti^ nf the tm.il program 

The careful formulation and definition of the professional 
*ditK>)"* nbjn tivcri make jwwihlc the development of examinn- 
itrin ii and other means of appraisal which will reveal the degree 
io which these various objectives have been attained by m- 
dividual students. The ‘stores derived from these tests repre¬ 
sent the most directly valid criterion measures time can be 
obtained 

'I he analysis of objective* and the development of valid meas¬ 
ures of their attainment then make possible a more rational 
choice of selection procedures The individual professional 
school should form l*y|*othc^c# as to the abilities and traits re¬ 
quired of students if they arc to complete the program—that is, 
attain the various objectives These hypotheses may then serve 
tenialive guides for clmfWJiig aptitude tests and developing 
other selection procedure* Tliii kind of approach would be a 
decided improvement over the practice of choosing tests be- 
cmw they happen to be widely used 
Aptitude true* and other selection procedures thus need to 
be validated against the criteria of proficiency derived from the 
educational objective Hut selection procedures should not be 
validated against the criteria solely m terms of validity co¬ 
efficient*, but also in terms of the proportion of candidates 
who have amcmfully completed the program, And so the 
emphasis is as much ujwn the program of the professional school 
as it it upon the admissions procedures. 

Each professional school should carry on continual research on 
the effectiveness of iH selection procedures and various other 
aspects of its total program Selection procedures need to be 
empirically validated, since one cannot assume that they vnl 
be effective in one situation if they have been so in others. 

Continual research Is also necessary because of the fact of 
change. Changes may be introduced in the curriculum-'jnew 
objectives may be added, others revised or dropped; standnr 9 
of achievement may be raised or they may be lowered Any 
one of these will change the criteria of proficiency. Changes 
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may occur in the nature of the candidates and the student 
group finally selected Some professional schools may be at¬ 
tracting many moie capable candidates than formerly, per¬ 
haps the opposite would be true in other schools In addition, 
new developments in techniques of assessing promise and apti¬ 
tude for professional education may be worth trying out, All 
of these considerations and many others suggest that the pro¬ 
fessional school make n periodic check on the effectiveness of its 
program. 
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1 arlv m i ** »hr Inline I nuance Fx«immation Board de- 
add t<n aNcnipi the ilrx r|i*pimnt ol ail instrument for meas¬ 
uring noninicltalua) factors awn-iatcd with subsequent aca¬ 
demic jihimmcui ui valley Prcvmus Attempts to predict 
academic .ufitncjuem through the use or attitude, interest, 
mum iu«»n, (, r pcrwunlitv inventories had uniformly shown a 
pmituc hut disamwinidy relationship frllis (1,2,3) 
\uh Iwtm itioM acme m summaris'ing results nf experiments in 
tho^ held Vvmld^s we lei* that better success might be 
obtained b} du" «^r ol a quern ion naire especially designed for 
inquiring into |*rrrmcnt .iiiitmlcs* interests, and motivations 
of entering adlcec iredmirti. previous commentation having 
liecn carried out largely with ilir use of standard instruments 
not c«.patallj dcugnrd tor this purjw Uvnrdmgly, we under- 
(fMik this tevarvh lnr flic College Board 1 
lienn* were commuted covering ureas roughly blocked out 
as' motivation lor attending college, intcllcctunl interests, 
leather relations and study IuIhh. vvirhholtling from outside 
iictiviuct, and parental hacking Altogetheu the original ques¬ 
tionnaire presented the resjunulent with jiH questions. Ihis 
whs administered in October, 19 17 * ^hc members of the 

freshman div, (t’la-a of 19^1) nr an eastern women’s liberal 
arts college three weeks after their admission to the college 
This lir^t admmistratum showed that, tilthough the question- 
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nane was admimstiable, its length made it very unwieldy 
Tlieiefore, the sections coveimg parental backing and with¬ 
holding from outside activities were arbitrarily dropped in 
subsequent repioductions The effect of this action was to re¬ 
duce the number of similar scorable attitude-interest items 
between the first and later administrations to 14$ 

During tlie remaindei of 1947, and throughout the spring 
and summei of 1948, the revised questionnaire was administered 
to applicants foi the 1948-49 freshman class (Class of 1952) 
at this same women’s college along with their usual applica¬ 
tion papeis 2 Questionnaires received from applicants who for 
any reason were not latei actually admitted to the Class of 
1952 were discarded 

Allowing the Class of I951 to complete their freshman year, 
the initial task we set for ourselves may be summarised as 
follows 

1 Give each membei of the Class of 1951 a relative achieve¬ 
ment rating 01 index, i e , an index of freshman-grade per¬ 
formance relative to scholastic aptitude 

2 Compare the questionnaiie i esponses of overachievers with 
those of underachievers 111 older to find those items showing the 
gieatcst response differences 

3, Compare the questionnaire responses of the Class of 1951 
with those of the Class of 1952 m order to determine what ef¬ 
fect the difference between postadmission administration (Class 
of 1951) and preadmission administration (Class of 1952) had 
on average responses 

4 Prepare a key by which to score the Class of 1952 ques¬ 
tionnaires, based upon overachiever-underachiever response dif- 
feiences found in the Class of 1951. and adjusted for pread- 
mission-postadmission lesponse differences that may have been 
found to affect keyable items 

5 Using this key, score the Class of 1952 questionnaires, 
thus obtaining a piedictor of academic achievement for the 
members of this class 

The™Achievement Index,— All applicants for admission to the 

1 The questionnaire bus also been Administered under similar conditions to applicants 
nt this college for the 1949-JO freshman class (ClaS9 of igft) However, in this paper 
tve are interested an If rn the first two administrations 



Kill (Anal* A I, AMD I'lYumOfJJCAl. measurement 


college where wc cimrd mil our study .uc required to take the 
SMttWi Jpuhuk 'fat fif the College Entrance Examination 
Board 1 hctc wore* on the SAT verbal anti mathematical see* 
liana compared to Freshman-Year Grade-Point Averages of 
the members of the C law of 195 * formed the basis for assign 
jng an achievement index to each student. The following re- 
gtevucm equation was uwd in calculating the moat probable 
Krediman Year GTA from the SAT V and SAT-M scores, 
f ** w (Xi \ \>J \ 4. W. where X, ^ SAT-V and X 3 » 
SAT \L The multiple tarrelfUitm of SAT*V and SAT-M with 
Freshman-Year (JPA was found m this case to be .51 The m- 
cercorrclatjoms of these measures are shown in Table 1. 
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Each student was then aligned an achievement index on a 
standard rale having a mean of U and a standard deviation 
of 4. Those whose test scores and GPA brought them furthest 
below the regression line received the lowest achievement In¬ 
dies and were by definition underachievers, while those fur¬ 
thest above the line received the highest achievement indices 

and were by definition overadnevers. 

Ovmuhimr-Underachiever Different m —In order to accentu¬ 
ate whatever differences might exist in the characteristics an 
questionnaire response* of overachievers and underachievers, 
it was decided to compare the 3 ? mast extreme ovcrnchievers 
with the 37 mage extreme underachievers. It is interesting to 
note from the figures presented in Table a that °ur over- 
achievers had also been overncluevers in high school, Although 
their scholastic aptitude scores are somewhat lower than those 
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of the underachievers, their high-school grades were signifi¬ 
cantly higher, (Since Freshman-Year College GPA is undoubt¬ 
edly very highly correlated with the achievement index, the 
large t-value for the difference between the groups on this 
variable is really meaningful only in demonstrating that the 
extreme groups selected were actually quite disparate.) 


*1 ABLE 2 

Companion oj Extreme Qverathievers twd Extreme Underachievers on SAT Scores and 

Academic Achievement , Chit qf /pj7 


Measures 

Extreme 
Ovorachltvom 
(N « il) 

Mean b D 

Extreme 
Underachievers 
(N — 37) 

Mean S D 

DIE 

fetch re 

t 

SAT-Vcrbid 
SAT-MathcmiUicid 
High School Aver 
ngc, Adjusted* 
Freshmnn Year Col¬ 
lege Grndc Point 
Avcrngc 

505 iol 

471 90 

1 0,28 

1 75t 0,35 

520 94 

487 fo 

1 8+t 0 45 

3 39t 0 39 

15 

16 

0 52 

1 64 

0,65 
0 89 
5 89 

10 92 


* Adjusted by the college on the bruia of previous experience with graduates from 
those high schools concerned 

tOn tins vnnnble the higher the numerical value the poorer the achievement rep¬ 
resen ted 


TAULE 3 

Age at High School Graduation of Extreme Overachtevers and Extreme Underachievers 

Clan of 1951 


Age 

(Veare and Months) 

Number of 
Extreme 
Ovemchlevera 

Number ol 
Extreme 
Underachievers 

186 through 18-11 

% 

I 

18 0 through 18 5 

6 

9 

17-6 through 17-11 

12 

14 

17 0 through 17-5 

8 

10 

1 < 5-6 through 1611 

6 

3 

]C-o through 16-j 

3 

0 

Total 

37 

37 


The data on age at high-school graduation and size of high- 
school senior class shown in Tables 3 and 4 are presented for 
their suggestive usefulness only, We are not prepared to state 
categorically that ovei achieveis will generally be found to have 
graduated fiorn larger high schools and at a younger age than 
underachievers. Nevertheless, such a conclusion does not seem 
unreasonable when such factors as grade-skipping and greater 
competitiveness in larger classes are considered 
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decided to velar those in which the average response difference 
between ovcratliievm mid underachievers amounted to at least 
5 per cent of the lotal continuum length. This procedure re¬ 
sulted in the election of fa items. The nvcrnchievcr and under¬ 
achiever responses to these f»i items were then compared by the 
Cl\i»*qunrc technique, resulting in the distribution of P values 
ahosvn in Table ? 

Discarding those items having .i P of >.50 left us with a 
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maximum of 4 5 items with winch to work out a scoring key 
Actually, three scoring keys were eventually used in scoring the 
Class of 1952 questionnaires one based on the 45 items with a 
P of <50 associated with Class of 1951 overachiever-under- 
achiever response differences, a second based on the 31 items 
with a P of < 30, and a third using only the 19 items with a P 
of < 20 This method which we adopted for selecting the 
questionnaire items is essentially similar to a method which 
Thorndike lias described (6) 

Preparatory to constructing the scoring keys, however, it 
became necessary to make allowance for the important dif¬ 
ference m the social situations under which the members of 
the Class of 1951 and of the Class of 1952 had answered the 
questionnaire, the first gioup responded after they had achieved 
their goal of college admission, while the second group re¬ 
sponded before they were admitted and even before they knew 
whether 01 not they would he admitted As was expected, this 
difference in the conditions under which the questionnaire was 
administered to the two groups was evidenced by a gieat deal 
of disparity in average response of the groups to the same ques¬ 
tionnaire items Of the 145 items, 90 were found to have signif¬ 
icant prendmission-postadmission response differences (CR of 
200 or higher), This finding in itself has been considered of 
sufficient interest to be leported upon separately (4) For pres¬ 
ent purposes wc will simply give an example of how the scoring 
key was modified or adjusted to take account of the difference 
in average preadmission ancl postadmission response 3 

Item No 113 asked* Do you feel that the higher the grades a 
girl gets tn college the moj c she will amount to after college ? The 
five lesponses that were provided foi this item were (1) Yes, 
(2) Probably Yes , (3) Unco tain, (4) Piobably No, (5) No The 

1 I lie unadjusted scoring V*cy for the 4? items wjth a P of < 50 was applied to the 
f]ucifiontiiurcii of the 7>f extreme achievers m the Gass of i<jji on whom tne hej- wrs 
baled In order to determine the relationship between this score mid the achievement 
index, a Imcrinl correlation from widespread Gasses was computed, the 9 tnndnrd 
deviation of the<|ucHtinlUifiircscores for the complete population being inferred from the 
Lwo tnih by nutans of a formulii given by Peters and VniiVoorhis (5) Inclusion of only 
the widespread classes made unnecessary the scoring of the ciuestionnnin.s ol nil 3J5 
students in the Class ol 1951, a task which stemul unwarranted m view of the het 
that tliL key was being used with tht* same group on whom u had been based r lhia 
correlation coefficient between t|ucsiiomiiiiTC score and achievement index, an estimate 
of the value for the full population of 355, wns -b 36 



Mil'CATlClhAL Attn PimidLflfclCML MEASUREMENT 

avr-Mgr juntad minion respond (Class of jg^t) to this item 
by rhr mcr.uhirvm was j 40, and by the undcrnchieveta Wa& 
,1X1 l ; w *he entire flaw ^ ! 95 l the average response was 
$ hi K but fnr the preadmission group (Claw of ig$o.) the aver¬ 
age respite was 2.71 Thus, m preparing the scoring key, a 
shift of the prcadmKsirm group toward the "Yes 1 * terminus of 
the 1 respn/uw! umtsnuum had to be taken into account This 
shift, or average difference, in this case amounted to ,90 or 
•w5 per rent of the enure length of the continuum (90/4,00), 
Presented below is the coring key that would have been used 
tf the preadmi^unmpmudrni\5ium difference had not been con 
ftichsml, as com pa rod to the adjusted key which was used: 

5kw to be given fur indi- 
ratal ,, 0 a 3 y 

I VaiSjwMed Key < 4 ^ 4 on 

Adjirtal Key 1 or 4 j % 1 

As it evident from the uIk«vc, the key was so devised that the 
higher the Mure the greater the prediction of ovcrachicvement; 
abwi, w M in give additional weight to extreme responses. 
Using the key, each of the J46 Chios of 1952 questionnaires was 
given three different &uirt% m previously explained, on the 
bam of the Class «f ig^J nvcrachiever-undernchiever P values 
of the differenm, The means, sigmas, and reliabilities of these 
three scores arc ■shown in Table 6 T he rather low split-half re¬ 
liability radTrcicnts obtained indicate that the items included 
are fairly heterogeneous in nature, 

In July, 1949, after the members of the Class of 1952 had 
completed their first year, we received their Freshman Grade- 
Point Averages and were able to proceed with the preparation 
of the criterion, the achievement index, This was prepared in 
precisely the same manner as had been the case with the Class 
of 1951, namely, by regressing Fresh man-Year GPA on CEEB 
Verbal and Mathematical Aptitude Teat Scores, The formula 
used m this instance was Y ^ **= 002 (Xi 4* X») + 4,610, where 
X\ « SAT-V and Xi «* SAT-M. The multiple correlation of 
SAT-V and SAT-M with Freshman*Year GPA was ,p. The 
in terror relations of these measures arc shown in Table 7, which 
may be compared to Table t for an indication of the stability of 
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the measures at this college for the two successive freshman 
classes 

Table 8 shows that, using the key on the Class of 
195a questionnaires, the correlations obtained for the three 
scores with the achievement index were iz, ,io, and 14, re¬ 
spectively. Thus, in om first tiy-out of this new questionnaire 
we had predicted by this extent the first year academic achieve¬ 
ment (grade performance relative to scholastic aptitude) of our 
experimental group 

It will be noted in Table 8 that all of the three scores had 
correlations of essentially zero with adjusted high-school aver- 


l’ABLE 6 

Q/iesftomiatie Scores, Class oj 1952 ( 346] 




Scots 1 
(45 l|enu) 

State 2 
(51 Hems) 

Score 3 
(19 llenu) 

Mean 

Standard Deviation 

Reliability (odd-even correlation, corrected) 

138 I 

1 * 9 

5* 

89 I 

10 0 
.41 

57 6 

B 5 

41 

TABLE 7 

Intercorrelatfons qf CEKB Scholastic Aptitude Test Sections and Ft eshtnan-Year Grade- 

Point Average, Class qj 1952 {N — 346) 

SAT V 

SAT M 

Frc&hnun 

Year 

GPA 

Man 

SO 

SAT-Vcrbal 

SA 1 -Mathematical 3 “ 

Frcshman-Ycar Grade 

Point Average .47 

39 

■47 

39 

5>5 

494 

3 51 

91 

80 

0 57 


age and with the mathematical aptitude test; scores 1 and 3 
also show zero correlations with the verbal aptitude test. This 
is definitely encouraging Whatever aptitudes are being meas¬ 
ured by the questionnaire we can be reasonably sure are other 
than those currently measured by the more traditional meas¬ 
ures 

The figures in Table 9 indicate the effect of adding Question¬ 
naire Score 1 to the other variables in predicting Freshman-Year 
College Grade-Point Average for the Class of 1952. Including 
the questionnaire score with the two SAT scores increased the 
multiple correlation from ,52 to .531 while adding it to the bat¬ 
tery of SAT scores and adjusted high-school average increased 
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that the (jiM^tHtiuuirc *murr is a member ml the luttcry which 
is independent wi the wilier predwtor \ amides 
We (cel that the difference in tunc ml adnumuranun may have 
made the key basrd on thr it/fi ( lav. MUsweNn^ after ad- 
rtmutun) lest appropriate for the rrsjwjnsei of the 1955 Chi&a 
(answering before admission) than might have been true il 
both grmi|tt hud rcs|iom)ed under similar uruniisum'tns The 
rather low carreUiuitvi obtained m the first experiment may 





have been parhall> a rrwdi o| *hs r * t We are now m the 
proems of preparing a 'konn** kr\ it# the Cla^ of 1953 q UCSt 
tionnairc^ (entered ui!lcy<* fall nf i w j from thmanic women's 
college The key *mII tw krard umatlnrvrr underachiever 

questionnaire resprt^ ddTWrm^ in the ( of i 9 p 
rough ami-ready juIjiiMmetm well he ornery m tlm Clt8c 
to attempt to i nmprnMH’ kr rbifrtrim^ in time of adminnira- 
non, ns IkjiH rbra*c* fiqO and Hf< M anwernl (iirtpmtmnnaiic 
before mJmiwn and »u the Mitir mml situation 

and under the n^rnr piwurm I lie rnuli* nf ihm wontl ex¬ 
periment will he 4Vjo|aldr *Ur* ?hr rtad of the 1949 ^ tlCf ^ 
denue year and imH l*r reported ujwin At that time 
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PATTFKNS OF HRRWSK IN’ I PVR OF ASPIRATION 

TASKS 


ixB'h r» (mirs 1 

DfeVr Willy 

Fkkuinand Hop|w, one? of the earliest workers with level of 
aspiration, gn discussing changes m goals following success and 
failure, indicated that the direction and type of goal adjust¬ 
ment arc influenced hy the true inner aims and aspirations of 
the subject (11) Frank, seeking quantification of the concept, 
defined it ttperai ion ally as "the level of future performance m 
a familiar task winch an individual, knowing hi# level of past 
performance m that task, explicitly undertakes to reach" (7, 
p, itq) However, as Hotter points out (17, |>, 467), Frank's 
explicit factors were not ncccmirily flic Mime ns the implicit 
ones in which Hoppe was imerevud. According to Gardner 
(8, p 67) I m view may rather Ivc considered the empirical co¬ 
ordinating definition of Hoppe'* comcpl 

A* applied in most experimental aitu.umris the level of aspi* 
ration U determined by asking the subject to jrcrform some task 
ranging from a simple motor one, such ah throwing quoits, to a 
complex one, such as solving arithmetic problems Scores on 
such tasks are made known to the subject arid he is asked to 
state wliat he "expects" 10 make next time 'Hie difference be¬ 
tween this statement and the preceding achievement has been 
described as the goal-discrepancy score, and when computed 
for a number of trials, and the mean taken, such scores arc called 
the average goal discrepancy. 

This score has been the most commonly reported aspect of 
behavior in level-of-aspiration tasks. Hut it has also become 
evident that the test situation in which level of aspiration is 
determined lends itself to other observations. Not alone the 
average goftl-diBcrepancy score, but the consistency with which 
goals are set At n high or low level, the frequency of shift in 
goal level, the sequence of changes in goal level to success and 
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failure, tliilhulnrs in imiik WhiK the direction of shift m goal 
level alter suuc^s and failure ill have ken rejiortcd as addi¬ 
tional aspect* rtf level nf aspiration Miaviur (14) 

Level of aspiration has thus tended tn he used as a concept 
describing a complex nf Mwvioi involved 111 a god setting 
situation, 

Recognition nl the level nf aspiration us tt test situation from 
which a number <iT observations amid be made led to the for¬ 
mal description of bcluvim with the ia«k m terms of"patterns 1 ' 
of response The first major tontrilminm along the lines of pat¬ 
tern was that of Sears, who vvm\ the average goal-discrepancy 
score to charactrri/e her groups far, 22) She described four 
patterns; (1} bow pounvr disacp.imy score, (1) Jugh positive 
discicpancy satre, ff) negative dwirepanty scour, and (4) a 
mixed group in which no Jrar pattern of reaction was main¬ 
tained In elalmrating the Irduvmr.d correlates to these pat¬ 
terns she observed that the low |*»*itivc discrepancy-score 
group included only those who were markedly secure in their 
achievement, that the high positive discrepancy-score group 
included those who were aide 10 acknowledge rather freely their 
own relative Mnompwrm r along certain hues, and the negative 
discrepancy sane pomp gm ludrd dime who found it difficult 
to admit to another permit that they were striving for more 
than they were aide to achieve 

Sears used more than goal ds^repitney score in charucteriz-ing 
her subjects, \\V may note her nkmntian that , /'it seems 
reasonable to stippw that the aspiration level response forms 
pari of a duster nf -wiaid fwuruJity attributes which may 
function as a whole in a number of different situariona 1 ' (21, 

p m)> 

Rotter, by extending the description of patterns, brought 
mto dearer focus the need for ait evaluation of the variables in 
the level of aspiration miuAiton (ao) lie described nine patterns 
of rcs|Hi»isP: (i) 1 mv 1 > 'ttored pattern, (3) low negative 

or very slightly |*taiuve ]) pattern, (j) medium high D 
score pattern, fj) achievement follower pattern, (5) the step 
pattern, (6) very high pmitiv? 1) meore pattern, (7) high nega- 

1 I) »* ii**d| bp fW»rf li SJ l r^ ■b’trlzgt' fug] tCOfS 
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i g y* I) %orr j*4Mrrffi (A) rigid |utrcrn, ty) the confused or 
bifakdriwn pattern 

U i,m S»r noted by mspriuon nf Rotter's patterns, lie has 
uddr+l a ehara* (eristic to ScarV goal discrepancy icore by m- 
< hiding patterns d^nribing she manner nf shifting or reacting 
to suuw and buhifc m thr level of aspiration task Patterns 
4 failucvcmrni follower), $ facepK N *ngid) t and 9 (confused 
nr breakdown) arc mmcnird with deHrihiiig the method of 
jtdjiKUnft goals to and failure, and say nothing about 

thr height of jRoal level setting bor example, the rigid pattern 
(pattern $), wlmh was characterized by Hotter as involving 
favi \( any shifts in goal level, could have •ret goals either high 
or |mt In the first cave thr patterns would be rigid and low 
negate? II more (pattern* H and if, and 111 the other case they 
would Hr rijjul and medium or very high positive D score (pat¬ 
tern t A and \ mi M Presumably the Mi.ivinr.il implications 
would Ir diflrrcnt in rath case 

Du fly, m attempt mp to cMablidi a systematic framework 
hr the description ni personality. has suggested tlut “person¬ 
ality Ive desailml in terms rd the direction and the energy 
imiHili/aiimt «j| resj^oivic, niice all ltelia\ mr shows variation in 
goal clirciimn and tn miemuty " <f«, p 1%) She goes on to 
nU^rvc ili.ii "1 hr hrmd outlines nf |*monaliiy can he usefully 
■sketched in terms of what rlic mdi\u|u.tl approaches or with¬ 
draws from, with to hat intensity, and with w hat consistency 
(p i$9$ These nlncr*atnms arc appropriate to our .analysis of 
the variables in the lei el of aspiration task*! 

In the light nl Rotters attempts .tt pattern foirnulation and 
Duffy's uwtepi of two major variables of behavior, it seemed 
probable that level id aspiration behavior would be more fully 
douiltcd by lhe use of two major vamhlcs rather than one 
The problem set for this papci wa* the examination of be¬ 
havior m a level of aspiration task with a view to examining 
ihe^pctofir measures used, and defimng the patterns of response 
th.tr rniglu emerge from an integration of the two major varia¬ 
ble suggested by the literature fi f goal level setting (height 
of avenge discrepant y scored, and (-1 method of adjusting goa 
levelts to Harm and failure 



I I vi I, 01 aspiration j asks 


667 


Subjects, ^fask, and Scores 

Subjects — We selected as subjects for this study a number 
of patients from the outpatient and inpatient climes and wards 
of Duke Hospital The patients weie primauly those who had 
been referied to the nemopsychiatnc department for consul¬ 
tation 01 treatment, but also .1 large number of medical patients 
lefened for psychosomatic study to the psychosomatic clinic 
of the department of medicine and neuiopsychiatiy 
Group Criteria, The critena for selection were as follows 
white patients between the ages of 16 and 55 Additional con- 
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sideiations weie* dividing the group into equal numbers of 
male and female; providing for approximately the same age 
distribution m the male and female groups, providing for a 
distribution of diseases of aproximately the same type foi the 
male and female groups 

Selection oj Patients — Study of Table I leveals that in a num¬ 
ber of aspects we failed to attain the criteria listed above Our 
male group is largely the type of patient found in psychiatric 
clinics, the female hugely the type found m medical clinics 
The male group also included a large number of veterans and, 
in genei «d, was somewhat younger than the female group 

The group as a whole, nevertheless, seems to be representa¬ 
tive of those diseases icportcd as having psychological attrib¬ 
utes either etiologically or symptomatically (3) This wou 



f I)t ( 4 NdVAt AMI TNVUirJl OCrJt AJ MEASUREMENT 


Mrrn. n» h-nc lvr« afi 'Mh.mfjftC' Mmc our aim m the study of 
this vtmsp won to fiei *if icft.iitt monies of adjustment which 
might liavr Wen more draiii>iiir>il)y in evidence in a patient 
group null crmtimnaf dirti* ulfm than in .mother group m which 
wh diflicrriurrs were In* fijivmu* 

4 [«nk Amnnp ihr vatimi* level of aspiration techniques de- 
acutad in die literature i id, J4K the target-aspiration board 
iii«d by Rotter %mncd particularly appropriate ns n clinical 
imtrumfrni The board is easily tarried, ran be set lip near a 
pAlicm'ft bed if he te nnn.inibul*nory, docs nut require much 
energy output, w interesting and challenging, and seems to 
evoke ;uii\c partiupmion on the part of the patient. In addi¬ 
tion, tnnml(table opportunity for quantification is possible be- 
cau^c of thr wide s andy ol measures that have ken suggested 
and lined 


'the board we u*cA (allowed RmterN specifications, This is 
a |*™ml vine j«* Bong with a ^tpiarc groove down the center. 

A ttrcl hall ii» hit along I he groove by a suck resembling a 
miniature Mb-ml rue Regularly npated dcprcwiurn preceding 
the numbered unite, and also one placed in the center of each 
numbered umi, 4nw down the qiml of die ball and provide 
a rrriing place f«*r it when n luma to a stop The score is 
dependent how ch^c i<i the trntral unit the ball comes 
to tefsr, regarding »d the* direction 1 he central unit, painted 
Hi white with the black mnnltcr ten on it, counts lo points 
Tlw on a iHmt above and below count v fH'inte, the next ones 
and «t on down u* r, die value decreasing as the distance 
front the uu tease* The unite under io are painted alter¬ 
nately blue m\ gray u*b P-tU* *M>, ,, ,, , . 

Ratter idU* for the iw of a one inch ^leel ball, but because 

of the p&i war difficullv in gening supplies we had to lw con¬ 
tent mth the itee of a fl w ball 1 hte had the effect of making 
control of the lull more difficult wml consequently cd making 
(he problem for (lie subject* more difficult. It was felt that 
dus additional Mow was advantageous l« our experiment and 
did not contravene Uniter's criterion that the problem be 
within the level of achievement and not so difficult as to innKe 
the ftwbjjca fed *'.U one extreme of idem without having 
any immediate romparteon with other* 1 ' (i8, p 411). in an- 


anti shi umucwwiwi mni nwiu ” 

lecm did not find the task either unusually difficult or easy. 

'Hie mstruaioni used were identical wnh those described 
by Rotter; After appropriate inti*, the subject was asset! co 
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state the score (aspiration bid) he expected to achieve in the 
next series of five trials The results attained (performance ) 
were penalised by giving no credit for achievement above the 
asmiation and deducting two points for every point that the 
achievement fell below the aspirntion. This score was known 
as the earned score , which was conspicuously announced to the 
subject The task continued for twenty series with a rest period 
after the tenth, After the twentieth senes the subject was in¬ 
terviewed as to his reactions to the test. 

Scotes — The scoring was accomplished through the use of 
certain quantitative relationships suggested by Rotter and 
Kluginun (12, 13) and some that were original to this study. 
However, subsequent analysis suggested that different meas¬ 
ures were apparently measuring the same thing The measures 
we began with were as follows: 

Ds—the mean diffcicnce between the earned score and the 
immediately subsequent aspiration bid for the series of 
twenty tiinls (The actual N here is 19,) 

Dp—the mean difference between the actual performance 
and the subsequent aspiration bid for the series of twenty 
trials 

Dsi—thc mean difference between the earned score and the 
subsequent nspirntion bid for the first ten of the twenty 
trials 

Dsi—the same ns Dsi for thesecond ten of the twenty trials 
Dpt- -the mean diffeience between the actual performance and 
the subsequent aspiration bid for the first ten of the 
twenty trials 

Dpa—the same ns Dpt for the second ten of the twenty trials, 
ft succcsses—thc number of times in twenty that the subject's 
performance equalled or exceeded the aspiration bid. 
ft failures—the number of times in twenty that the subject's 
performance was below the aspiration bid, 

Direction of shift after failure or success—the number of times 
a subject raised, lowered, or used the same aspration bid 
after success or failure 

ft shifts—the number of times m twenty that the subject 
changed his nspirntion bid (eg., 20, at, 10, 20, s< = 3). 
ft unusual shifts—the number of times the subject lowered 
his nspirntion bid after success or raised it after failure, 
Nate was also made of the different situations, i.e , after 
success or after failure. 

ft repressions—the number of times the subject failed to an- 
nounce his aspiration bid before beginning a new senes, 

J Bcorc—judgment see" -' L ’ ~c- the 

aspiration bid and [C - • ‘ d|rec ' 

tion. (Attainment “<y. \ 
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r * doll (hr numK’J t f dull"- <om cried m jicrceji tacts 
iN - U/) 

r " f t*«9fr*4«9i,nH dufs* }!njni]*ff »d dnlr« m which the 'subject 
tn'*r<l W jnpiMi on bul afirr m» <o* and lowered it qfter 
/Amur, M’srtvericd mo* pertcrtlagc 
*7 Mm? nr r 'f (f ftjssff mimhcrof iimr* m which sub¬ 

let t fasVd to change hn aftpuaimn hitl, converted into 

J ivu rn tige¬ 
rs jtuu «fe nuud*<rr of <disf( f m which the subject 
Sewered hh aifMaifow h(| afser oiupi and rimed it after 

fashur, i/uvenrd jnJu percentage 
W dtffrfcw iHcJi (be ftsnnwr oi different nuruhm used as 
nApiMiK'fl M-i (eg it, i \ ac, jr% at -* aj 

In addition u*i shc^r measures trrt on other relationships bc- 
twren sv8«*,miirc% were 5 rncst stalled 

\U I>p iJhe ddfc tenor kiHfOi P* and Op 
IMP a the dufeicmr tatwren fhr fu-^t and second halves 
* 'J lb 

l)pl*flp2 the ililTrirntc be t Wren I be fini and second halves 
of Pp 

IKi l)|U she 3*rswren thr fir a half nf l)s and l)p 

JJ*;j 11r*at the d tfr^jsir (irtAren ihr *couid half of IK and 
Ity 


hwi\w i wf Anuri 

iht ih >*wt I hr traditional *unre* used in Irve 1 ! of uspju- 
fmn „ur *oif *pp " fgu d discrepancy rutorcj and the 

"number mJ fihflinote However, m examining the test situ¬ 
ation sse iiofrd that she *adi[r* t after the completion of a series 
of ti,.K w.« > «onlr4i»rrt| with two More*. the hrst, his actual 
preform nar vmr<, and the moml, thr earned store While the 
canard host m r* rwphaH/rd by thr examiner there was no 
jvHir.mt*' ih.tr ts was of ciju,i| importance to the subject We 
felt H hi^lilv probable ih.nt the ♦uilijn t might rc.ut to the per- 
formiiue M«rr rather ill,m to iW earned score, or perhaps to 
souse injnlun.ifmri of ihr two We thrrclorc used both a l)s and 
Dp m n?v lor rath sublet f However, mi inmpuimg the l J cnrson 
product moment r for three two \ ambles we suurcil ail r of 
rj 4 # fn^rmr group It ihir* seemed that the two measure*! served 
identical ptsipmes and we dropped Hie use of the P^ treore 

fht f)u\ Hpi f)pj '(lip ddlerciucs between the first rind 
second halves of the lest as far as IK ami l)p stores were ion- 
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Lcrncd svcil lint icm^irk.iltlc However, there is app.ucnily 
greatci v.mabiliiy m the HCioml hull, ,is noted by the linger 
SD stoics m Ds2 ,iikI Dpi I’lin ih also suggested by correla¬ 
tions of 6n between the Dpi and l)p2 stores, and of 649 be¬ 
tween the Dsi and I)&2 scores 

An examination of scattei diagrams suggests that only a 
small nuinbci of subjects varied between the first and second 
halves, but that these cases varied markedly, Apparently some¬ 
thing caused this marked change in behavior from one part of 
the test to the next for these subjects. 

The differences between the means arc not significant (t 
.loi, .156), but the differences between the MVs show signifi¬ 
cance at the 1 % level of confidence for the Ds score (t 2,67) 
and at the 10 per cent level of confidence for the Dp score 

(t ea I 86) 

These findings suggest that there is greater variability in the 
goal-setting activities in the second half of the test than in the 
first half Tins differs from Rotter's icsults, which show greater 
variation in the first half, presumably as a function of the 
amount of time it takes the subject to find his level. Where 
Rotter obtained a decrease we found an increase in variability. 
One possible explanation may be the tendency of some subjects 
to tire of the task about halfway through, or to become too 
tense Some expressed a wish to terminate the session at about 
the fifteenth series It may be that nt that ]x>int the diaiac- 
tcristic behavior gave way to other influences and the subject 
utilized another type of behavior Other possibilities may have 
to do with differences in our population or tusk from those of 
Rotter. 

M Sttcttsics - v\s Rotter suggested, and contrary to King- 
man, we found a very high correlation between the goni-dia* 
crepuncy score (Dp) and the number of successes and failures. 
Since the number of trials is fixed, successes and failures are 
reciprocal numbers. Our correlation between the number of 
successes and the Dp score was -,934, which agrees with Rot¬ 
ter's findings of 90 and 88. 

Thus our Ds, Dp, nimiber-ofisuccesses, and number-of-faib 
urea scores give us essentially the same information. This sug¬ 
gests that the relative height of the aspiration settings lias the 



ft;2 MFA'trRrvtrST 


hum mm *d ^mU'dbng iIk ihr lurlicr the aspiration 

*if»rr Mu h*&rr if hr numb' 1 ? r»| ^uufiUM, the lower the aspua- 
it<m smrr rhr hmhn ihr number t*S toluenes Hus very rcl.i- 
tnmdup frcmplmsi/rs th** pres nnis hndiuj^ in level of aspira¬ 
tion uludir* nil low gad vftjoj* for the purpose ol attaining 
mur^, aM tii Inyh gmd wmng in which pusublj the stoU- 
fftffif *4 g^vds is it sell the goal 

9 Shifu Turning h«w in the number of shifts, we find a 
scry 1 low iotrrbiion f s with Dp awl a correlation of 
with ihr number r+l It ihm appears tltai there is very 

hub* !i*fwr relationship bet wno Dp, smew, and the number 
nf *h»fn H«wc\rr* we examined the* relationship between the 
number rdMHtfMr* and ihe number of shtlis for curvilineanty 
and found an *? o( i lest mu the Mfjnitiiuwc through \\ we 
found a\ ? of He with a Pnf ' suggestingthe probability 
that dvr mm ihnc nr trial am ship held at a confidence level in 
excess of i |*Tic*nr t\irK flu* c 3 table ofPcirrs and VanVoorlna 
OfO nr ohtamvd an *“of UiU. which a bo r^imb the one per 
rent c^nfubtKr level, I he mmhncar relationship noted here 
stefms ifi warrant spent! Attention 
Our rrsidn 4iipgnl that both tho*e subjects who shift ,i great 
deal and (how who hardly shift at all have the greatest number 
of mouF-wes, that thoivc who shih the mein number of times 
range from the mean number rd ■uiurwta in idigluly below, 
Tim* the number of shifts rends in distinguish llircc types ol 
Ittluvmr fti Tin**; who shift hardly at all This is itsomewhat 
rigid j)pr of behavior and may have two results ns regards the 
number of sue tews If goals are set low and in,untamed there, 
the number of shucks is high, if goals a it* high the number 
of sue ceases i* low Our group contained mostly the low goal 
setters O) Tlm« who sldfi a great deal This type of behavior 
conforms to every clungr m miter pressure 'I he attempt to 
attain siicortf by changing goals reflecting the immediate status 
of Achievement tends to insure a high number nf successes, (j) 
Those who shift within the range of i i ^ 1 ) of the mean num¬ 
ber of shifts, this behavior apparently involves holding a self 
level of competence an » goal and reacting to this level, shifting 
up or down depending ujsun ihe tranfol the performance ruthcr 
than on the momentary suuc*8 or failure, 
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of ■khtji W> wrrc particularly interested in the 
subjects rcjcMons in Miu^n and failure .is they influenced 
goal setting, *'nd uuiMdcred three pPMiblc reactions raising, 
lowering, or Imldum in the ^mir aspiration tevel We felt that 
6 y taking the per* cjimjich r»f those reactions that conformed to 
the previous |Hrfnrtrwruc (ennf % raising after success and 
lowering alter Uilurr) a useful measure might be secured (The 
realism store uf Adam* fi]) We also designated two similar 
measures (he prrtcmaxr of deviant shifts (dev %—raising 
after failure and lower tin; after Success), and the percentage of 
absence of *hift (same r £| which we tentatively related to 
rigidity of behavior The combined conforming and deviant 

t ABIT a 

C&rtUfi&'ti fatmir* (rrfa i*. WfMwn %f l fjf rfipwelton Ptiftmante in the 

f rjuv/rnu Uts-tof) S ** $0 
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percentages were, of course, equal to the total percentage of 
shift 

An noted in Table 2 , the correlation of these measures with 
Dp and $r MirccMcv is riot significant (9, p 299). While the 
correlations of conforming %, same %, and deviating % with 
the number of shifts are high, these relationships conform quite 
closely to rx|tctr.ittoni. The conforming percentage will be 
highly correlated with the percentage of shift, since it accounts 
for all shifting except the deviant, Our correlation of .880 con¬ 
firms tlm The correlation of deviating %with % shift is 587, 
and scum unimially high. This suggests that those who shift 
excessively have a greater tendency to shift inappropriately 
Our three m?it«itrc* (% conforming, % deviating, % same) do 
not seem related K> our Dp and Jlf successes measures, but 
rather to the number of shifts. 
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* AVp/or i',w Wither measure of interest is the number 
r 4 rrprr ■< dim rHln T + dir anxiety the subject may have 

rr^ardma *,‘m 1 'witmg *z f , Vs " ^gnificaiil relationships with 

lhr < fh^j imu'rnrr* \*rrr (rund, with the exception ol a rela- 
lumdrrp 1*1 dir f i i Mipufuant ,it the 

Jrvrl ol < <4nft-> 5 * 011 cd 1 hr refl tiinndnp we found is therefore cun' 
'utlrrrd iKiirttiTthy I hr iruplu Mi»*n wim tn hr tlic higher 
i!?r mitmlKT iMI appmpmte Cult's, the lower the Amount ol ion- 
<cin u\ rr t h^ne oj • d netting 

"fSo^r 1 hr pid^mToi* <mur rrlrr u iu.il deviation m per- 
foriKarur Jimjh mpmiUMi hid* termed to hear no particular 
rcktinn«fup to am of the memoirs Iirteinlorc roj^irred It may 
|w- that thrj MMfrn iiir^wruiii «*mrthmi? quite diflcrcnt 
AVfrffirv/rf/ tvi? Wr mdrd with the following measures 
nlio li lud *cn»nr #i| , | a itrut \ aIck lor our sfudv Dp, # successes, 

# rrpsrsMmu, conforming W^ame'i clri-i,uing r ,4 % shifts, 
arid dir pitlrwrw -*-irr llicrr wiiiul to he a grouping of 
^orr* mrawirm# die sour* dung 1 hr Dp ami # successes 
ttrtittP'i io relate to * «n| hud «rMm# dir D conforming, same 
r* r*, ,| rv o^ 5 inf\ old ', doMx .nmed to relate to the method 
ntj adjusting gf-ah to nut^ aid Inlurr, she fc repressions 
K roird ilvi to tr irlitnl to tile method of adjusting goals to 
mitifqi and failure, hut to a Imutrd decree, and the J score 
termed lo hr an mdrjsrmlrnt valor 

ns 

Hotter, and Srars Iielosr lorn, iconised out, however, that 
motile of the unique »li ar;u tension of (lie performance in the 
level of .uspiraiion situation were hot hy raking the means of 
the mulls, of (he various measures, sime wide individual varia¬ 
tion in (he (mt situation was frequently lost in ihis process. In 
order to offset (he extreme variations that would tilncinc trends 
in the pcrformum?, rating Males were developed to extract the 

meaning of die performance Irnm the data, 

H*]((/}£ »Ve^/rj fut" ihi* Snt 4 y> noted that goal- 

twl jtttwx Mutvior \ arurd from high ju^inv e ^mvl disciepimcy 
stow u» nettAtive ^al diMrcpancy stores In order 
lo distinguish liehuvun groups 1111*11011(1111111111 was divided into 
seven jmrtt» cm the toad?* of standard deviations ol die means of 
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Dp ( md # leaf's In to:i odmicj methods of adjuring goafs 
10 success aud faifatr *c < lurat frrv/cd tiw types of behavior 
(ng^ r arbitrary* llrxjMt, »M»b«rnini|:, and achievement fol- 
lowin^il, Jmir iff wIimU *<nmd mdrm from our data, and one, 
aclucVGincnl IcilbmrT, Itommrd (turn Hotter, whit It \u* felt 
nnu.ltt: miur in ■unite Miblct ^ I hiss, c a* - )i subject's performance 
coukl be rated b»r pul Irvrl netting and ntrllmd of adjusting 
g l)n J<; to siKi c»« * and laibirr 

C,ntif frruf Arffw* I br nun,? dr for goal level selling 

was established b\ dss idmg due vatnmx Mores for height til goal 
level into sr\m Mep mien »!*•*, lu*rd on Hie standard devia¬ 
tions of Hie cApcnnuit d group's prrtormimc I",nt 1 i step in¬ 
terval was dc^i nltti! in icniunl die Iwlnvmr Hi,u seemed rele¬ 
vant for it, m ordn so nnU ihr Minin' more than a clerical 
checkoff, and to 111 dvr it possible !« rvahme pmjtcrly the few 
extreme* score* tbit could < r '*ninssh distort (he mean of any 
single record Ibr *orn »irp mien ah bad 1 hew descriptions 
(1) very high p-Miivr lofdt pmtinr. Ml |wk»iivc; (4) plus 
minus, (O nrg,iM\r« r <M InvJi negative, Ml very high negative 
A simple of 1 hr* Minu\ tb it to? Hk hmh positive step interval, 
read as Inline* 

j SrMmg in' di in * 3 i|j 3 rfahoe adurv eincnt Irvcl, goal 
celling ffcrrbasn S«n "inniji the god nwll, considerable cxpendi- 
uire ol effort iomM wr 1 !!, rarnc-i drurc to improve performance 
alini^i wtihosii rrlfrfrjts.r !<i Hie actual performance 

Ollier steps td goal level outlie were similarly described. 

Method hf JJjirfmg MiWj to ami failure - From our 

observations tt apprasrd Hut this concept involved it number 
of variables which wr defined operationally. The variables we 
uwdwerc, tfi I rc>jTi»iM\rtWMis. ^2? conformity to the stated goal 
abjciiivrti. and Jj# appiopnalciic^ tff At (inn, 

Uv rrsjw.nwvcmrss we meant the reaction! of the subject to 
ihe niuaii<*n hr was w and i<* change in hi*s situation High 
rtsjwon*avrnr**!- mraot sha? each change in die situation vvns 
reacted Im to 10 deal wnh or lo handle it in sopie 

wd> Hm res|*Mfuivi?wiM meant ibat changes in the situation 
we rv ignored t*t win dealt with by a void am e 
liy cnoformiiy to Hie stated pal objmiyes we meant re* 
uponding to instructions by carrying out their intent. High con- 
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frW'.m /n'^cd ad^rn^ 5 h sbr rulp- and the framework of 
r jwA I «< Ff * * fjJ< ’not V mv**brd thangnig the rules irul 
fMfiyrfl "jk /f iHs" fii'lit ' 1 '' 

H* apjn«poatfj c-'* of ntipm j#k mr responding tn the 
b* fcfrrewe to prr , M |f '«- sutc'c’i am] failure in the 
*”> \ 11#)* m*« Ked rr-jtfiMhng by action in 

krrjvT^ «.>h ptfv *-jji ^cev 1 and failure m the task, low 
appl-i'prw^nrM, l«lvc«|tfd ^"p'Jpdmg bv .Ht|r«n devoid of 
ba^so r<r* prrv^tMft or failure m the ia. 4 c the subject 

rop^rMirg bv mappropmrr behavior 

lhe«* 4 anaiOrii tfere interrelated and she fallowing charts 
prep* ted, i hr values of ilir variables for rhe five 

metb<»<h of iidjuMfnnti nr had ilem-icd" (tables j, .|) 
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/rl)mrifii*AV)*1p>p t fifpwr* /Vo>«J n f rtJ r» fia/ttf \itfhb4t ef jJjuUint Gush (o 

^tkfuti rfRir/ t#U.a'-*{ 
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’Hit: dkwtphojj nf the rdfing* follow* 

i Rigid Ians* re'd^nssvenr^ to rest Mtuannn, low number 
of shifts, panemed nr niyh/td shifts Subject clots not con¬ 
form to the stated objectives of the task of earning the 
highest ww, but H«M* i o a set node of response Responses 
are not lusfd on prcttmi* expenenaw of smew or failure 
Momentary changes in the situation are npjnirenily over¬ 
looked 

4 Arbitrary - High rmpnmpvrnr^ to the tat situation, hrgfi 
number of dnln Subject «b«es im conform to the stated 
objectives of rhe task, *iml tench to rajmml by goal setting 
that is diaracrenml by ’shifting up after failure and down 
after suetw Response's do not conform to previous ex* 
perwiew of tuceew or failure, 
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3 Mcxiblc Moderue rcspunsivcnm to the test situation, 
average number of diifi* Subject conforms to the stated 
rules anti Iramcwnrk of the r.uk, adjusts to previous success 
and failure by .111 estimation of probability rather than by 
holding exactlv to previous attainment 

4 Athicvemcni following High responsiveness to test situ¬ 
ation, very high number nf shifn, low conformity to rules 
and framework of lask, ws new framework, Subject con- 
fi.nm c-sattly to previous success or failure by responding 
111 the ^antc direction a* ihc immediately preceding achieve¬ 
ment 

5 Conforming Nigh rcsponaivcne&s to test situation, high 
number of dulix, high conformity to goal objectives by 
adherence to the mien and framework of the task, high ad¬ 
justment to previous experience of success and Jnilure, 
strongly influenced bv momentary experiences 

Reliability 0/ Rating* To check the reliability of both rat¬ 
ings a folder wiw prepared including (i) a typewritten copy of 
the instructions ami dmriptmmi of the ratings, (2) a table of 
norms for the various invasureu [previously described, and (3) 
a copy of the original protocols of the test for each subject, 
This folder was given to one of our colleagues 1 who was familiar 
with the general outline of levcl-of-aspiration testing, but who 
had no Hpctiiil knowledge of the procedure or purposes of the 
experiment No coat lung sessions were held, Thus the relia¬ 
bility ratings are bawd on the written instructions and the 
rater's interpretation of them These ratings were compared 
with 1 lie rating* made l>y the experimenter using tile same 
criteria, 

The reliability ratings on goal-level setting gave a Pearson 
prcxiutt moment t of gfij There were seven possible categories 
and complete agreement in ratings was found in 76 per cent of 
the use^ In the other 24 per cent of the cases the raters dis¬ 
agreed by no more than one category, 

In getting the reliability of the method of adjusting goals to 
success ami failure rating*!, we had recourse to coefficients of 
contingency, «inu we tould not presume that our categories 
were uiniimrmif The coefficient of contingency was c *■ ,823 
and corrected lor ? X $ tables (16, p. 39B) c ■< .920. With five 
possible categoric* there was im 84 per cent complete agree- 

•tiiflUrhil m na«wlc i« Ur Morris Krwcmtinj clinical psychologist, 

VA 1 WgmgV Ky 
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mem 1 Me* fehahdny of r.m tw\ was ako high for (his variable 
Iht PtHlnm ihrii /*TrfrrJ In Tahir Ka we have the array 
of the rvprfpmrmal group as u dmlnhiilftl iim?Ii in light of the 
two raiing* iiRful M can Iw noted, .khicvcmcnt following 
had no rambdate* (or us interval If may also he questioned 
whether the me of seven step intervals is a refinement that is 
jwtjuitarly helpful in diHnminaiing behavior, and whether a 
tfuilef daflrrrniMJpnn might not Iw more appropriate 
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We therefore tondcnvrd the ratings rd the byi^Jic n| goal-level 
siting by irnhsdmg in t»«r interval rhmr above 67 SD and., in 
another interval, rliine below M) Hie group between 
I ,67 and 67 M) was lombmed m the oiher interval 
Hy she elimination of the .ulweveinent following ibiwifiCA- 
non we had four methods of adjusting goals to sumw nnd 
failure Huts with three levels of goal level setting and four 
merlmdh of adjusting goals to soviet and failure we ended with 
twelve distinct patients of behavior fTtihle {b) 
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The patterns may In? entitled >w rullriss**^ 

1. High ptmiuvc goal level selling and rigid adjustment 
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II Mnl urn i^al l^u-l <ci*?ni' ntfd adjustment 

fll Uiyh mc^iinr- b^d truing «md rigid adjustment 

IV fltyh p^omr ;< iHrvri wijmp «iru[ arbitrary ad- 

jtntmrflt 

V MHuim J rj d Wd *r(liity ami arbitrary adjustment. 

VI Ibph iw£Mi\r !?vr| trinity ami arbitrary « 1 - 

jiiflifmrnii 

VII High |w«ititr Irvd guilty and flexible adjust- 
mem 

VIFI Medium p^al rl wvriity and flewblf adjustment 

[X IIif»b w+Mtivc Irirl wiling and flexible adjust' 
turns 

X lltyh pwinvr r*fMl I^srS ferinnjt and conforming ad¬ 
just wrni 

XJ Medium f«Ml bred wUffig ami ennforming adjust¬ 
ment 

XU Ihph wc^mSa** lr\cJ Mimg and conforming ad- 

In-ipf'Sifn *d iIk tald? **i paitcrn* Viable ybl for our experi¬ 
ment iJ bnds «■<< 4id«jni ming patient X, one subject 

using pjnrrn I\, and iam min# patient* III and XU With 
the luttn^d mutnKr u\ in enjf ex|wimcnirtl group, con* 

dimmn a** n« ili*- au rpi ability of pattern* cannot be made. It 
may, liMmrvrf, |*** nutirAK’iirtiiiy so nWepvc that, with the excep¬ 
tion of she arlutiMrj adfitHmcwi, totfth negative goal-level set- 
line Wp-r* not »1 >111 iww mr *nir gmup 

IWiat j,*mI lend wstssig h*« indeed been reported ns the 
more ivpical W1 um<^ ««n IcacI «f aspiration tasks in our West¬ 
ern culture In immt evperimctticr* (19, ty) As wc have ob¬ 
served eWwbrrr <4 i va 4 gruup ol af adjusting college girls wc 
found ttn ne^atue *?tfed Intel sm*i Kitty But negative setting has 
been rcj^urnl by Arluik *2) for epileptics and by Miller (15) 
in towvcr^m Ji viiena We* have dta reported negative goal 
MMmj m lty|««rrieit*i%*v patient* ijhThus, the presenccof nega- 
me g* mI Irvrf vrKtutg in mr experimental group may be attrib¬ 
uted m the hypettarunie juriedh within the sample, 

A*i ft»t she umaIci t»f adjujrimg gsuls to success and failure, 
we )u\c rf}»«ricd eUcwtlierv ft,I that die college girl group used 



i bvm 4 7m\4i *mi riHWnnou \ u miasijiument 


ihr *vid- «iwtoffmmfr*, and drtiblr methods, and not the tirbi 
Iran In thr Misif Mudy our hypertensive patients used the 
jm<l mn fowling methods predominantly and our ast |v 
fliMtH ^r^sp preferred she ngid mcwle of adjustment 
1* *(*uM stun appear that nr^inc goal level setting and 
shr arhj*M*) method «f adjusting goals Ui success and failure 
hr flbcsr nit Mmi.f amMii# rmain patient groups and absence 
fram imAM adjust fd groups* may he suggestive of less at 
rrffabfo mrlhod* nf adjusting 
UV mu\ *l%a a i|uc4ii»ii as i» the acceptability of high 
pmfUt? g*-d ScvtS writing m view of the observations of Sears, 
whn d^vribnl if jh an effort in aexure commendation rather 
lhan a fajisliMH gnal 

T» f rnnokfanwi nf the jhmr, a tentative formulation of the 
farluviflir mnM«ian! h each pattern &% well as some sug- 
K^fiwt a* f« she mom aiimi for such behavior might be offered 
in fhc Mln*ntg 

1 Heitrng ifoak rtcll above achievement, sticking to such 
goaK dtwjuK Uriurc to achieve Subject appears to 
place greater \aktc on high goal statement than on 
utlitcvcmrni, ns easily threatened by potential failure; 
wrH and such* i« high goals as a means of gaming re¬ 
ward for effort* 

H Setung gtojk withm the range of achievement, sticking 
ui ?m<rh grub <ic«pi(« Hurl turnons in achievement. Sub¬ 
ject wt* rw*m.d>fo goals, appears to fear failure, to 
handle fear by cautious shifts m goals. 

III. Setting gcuiU well below achievement, sticking to such 
goal* despite easy sutrteM in attainment. Subject ap¬ 
pears to fear failure, umirers success by an extremely 
law bid which is persisted in in the face of frequent as¬ 
surance of ability bear of failure seems overwhelming 
and is dealt with by retreat to safe but unreal position. 

IV Setting goals well above achievement; frequent changes 
in goal level with retreats from success tmd ovcrcom- 
pensatians far failure Subject appears to atiive for the 
appearance of success, but each trial is a challenge that 
odist for emergency adjustment Having succeeded, he 
fears he will not be able to do well and retreats; 
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having failed, he wishes to do well and overcompen- 
sate9 by high goal statement 

V Setting goals within the range of achievement, fre¬ 
quent changes in goal level with retreats from success 
and overcornpensations for failure Subject appears to 
set leasonable goals, but such goal setting seems 
frought with tension for him, Having succeeded, he 
fears he will not be able to do as well and retreats, 
having failed, he wishes to do well and compen¬ 
sates by high goal statement 

VI Setting goals well below achievement, frequent 
changes m goal level with retreats from success and 
ovCrcojmpensatton for failure Subject appears to fear 
failure and sets low goals to insure success, but such 


low goal setting and its attendant success are not 
satisfying and call for strong assertion of high goals 
He seems so uncertain, however, that the assertion of 
goal is overwhelmed by the retreat from a potentially 
precarious position, 

VII. Setting goals well above achievement, changing goals 
in light of trends in achievement, Subject appears to 
want to do well. Apparently recognizing the unrealis¬ 
tic height of goal level, he uses it primarily as incen¬ 


tive for better performance 

VIII Setting goals within the range of achievement, chang¬ 
ing goals in light of trends in achievement, Subject 
appears to keep goals within a modest range and to 
be able to view his own performance with some ob¬ 
jectivity, Goal setting seems primarily a rational judg¬ 
ment of probability 

IX, Setting goals below achievement, changing goals m 
light of trends in achievement. Subject appears to 
be cautious and conservative in goal-level setting, yet 
sufficiently objective to resist momentary changes or 
holding rigidly to one goal, 

X- Setting goals well above achievement; changing goals 
in conformity with previous success and failure, sub¬ 
ject appears to strive for high achievement, shifting 
continually to take advantage of every assurance and 
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5^ rricc„»r wub rmy rebuff kmicst desire to do well 
mrf^o?ir r H to wiitic extent die Jc.ii of failure, 

\I Setting guah witlim the ratter* of achievement; chang¬ 
ing £«aU on (nnlTornifiv with previous success and 
failure, i omnwdy advancing and retreating, holding 
At t body as ptsviiblc (n achievement 
\II Setting goah well Mow .uhirvcment, changing goals 
ooi (Kinlririffify with previous sikcw .iihJ failure. Dom¬ 
inated by a desire to sue teed, subject plays safe in 
goal wit in# Desire in do well is overwhelmed by 
fear of failure 


Summary 

D^uription* nf pattern* of brhavjnr on levcl-of-aspiiation 
ludii Imc* cinphud/cd the height of the average goal-dis¬ 
crepancy *turc at in tenon Further observations on the man- 
net jo which goals are *wt have indicated the propriety of 
eoHftidrriog more than one major \ unable in describing be¬ 
havior in level of aspiration t wk* 

thinm lifty 'subjects on the Hotter aspiration board and task 
and anal*.wig fhr >ariums 4t*uv* that had been previously 
RUftRirMaJh a* well an others ite» with this study, three major 
variable* wmed to emerge from the table of uucrcorrdations, 
Of ihrw*, goal level netting and tbr method of adjusting goals 
to wuw and failure wete the one* investigated. Judgment 
wm the third variable, but it! relevance was not thoroughly 
explored. 

Rating %ales were developed for the two variables in order 
tti msiure tile Wst jiweunent of the rest protocol. Ratings 
appeared u» Imp reliable 

A condensation of the table nf distribution of subjects 
showing the relationship of the two variables (goal-level set¬ 
ting and method of adjusting Rtials to miucss and failure) 
resulted in the denotation nf twelve patterns of response. 
Reference to other populations to whom this scheme of pat¬ 
tern! wu! applied suggested the useful ness of the method 
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EVALUATION OF AN OPTOMETRIC TEST 

a u laufr 

Iowa Stale College 
and 

WII LIAM B MICHAEL 
San Jwsc Stale College 

Background 

Various studies have been made of professional aptitudes 
since Moss and others developed a test of medical aptitude at 
George Washington University Although each of the several 
professional schools of optometry operating in the United States 
and Canada has used some form of entrance examinations or 
selective techniques to admit applicants, only one previous 
study concerning optometric aptitude per se has been published 
(n) to the knowledge of the writers 
Although dining the past four years the optometric course 
in some institutions has Irecn raised to five years, the schools 
arc besieged by hundreds of applicants who cannot be admitted 
Both the at bools and the profession have found it advisable 
to select for admission only those best qualified to handle the 
courses offered courses which arc heavily loaded in mathe¬ 
matics, physics, and the basic biological sciences 
The I os Angeles College of Optometry operates on a five- 
year program with two preparatory years required for entrance, 
To fulfill these requirements, however, most students have to 
present nearly three years of standard collegiate credit in stipu¬ 
lated areas In fact, many applicants have completed the bach¬ 
elor's or master's degree, while a few have already received the 
Ph 1 ) degree* In addition, there is a sprinkling of applicants 
who have been graduated in law, dentist! y, medicine* and 
osteopathy 

The ]m Angeles College of Optometry normally admits 
approximately one hundred applicants once n year, Since the 
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uiffitiiKuH had Ihtii tomplctcly revised into a five-year pro¬ 
gram, i! vidn tikd admit about 15^ * n the First Semester 
nf i>n» Hie Humnmf in enrollment w*m partly justified on 
the hi*n that the College would not graduate a class in ig^o 
bn ni'if ot the introduction nl an additional required year in 
file iirw t/mrir of stud) 

Vmbltm 

Cmiurqurntly, ii v»,v» deemed advisable in develop a satis- 
flu Mr v i^i lor uv in evaluating applicants smte preoptome- 
tr\ grades war only modcr itcly prognostic m (lut they yielded 
,1 Unci it*am ofonU 1 t \\i w nil grades m prolevurmal optame- 
in The lack ol a higher degree of torrclaiion w is probably due 
ui’lirgr torjuirc to slm prcwiuc of systematic dilfcrences in 
markmi? Mandard* ii various colleges and to differences 111 
(■yjdf* idI prr optometry ciiornula pursued by the applicants 
' It %asdcudcd In (hr Admumtraiiun to silt several hundred 
appluatsMjn and to allow about fwuc ns many as would be 
Aurpir.l 10 take the rntutue c\nmn irinn \ total of m) ap- 
p)n jhu towpIcM all A group nl this me was deemed to 
hr Milbiirm l«*t imdcmUiy an imn.il exploratory investiga¬ 
tion of the rebabihfii ol (hr test umH and for nwimainmg the 
cstrnt |o wlmh the measuring instrument would reflect indi¬ 
vidual diflcremr* so vanonv abilities and trails hypothesised 
In be* tii n«|*nrMn*,r nl she course of study 
The genera! hypothesis to In? investigated may he stated as 
follow*, Suvur^ m jWevumul npiomrtrrr courses is .1 function 
ul B (i) general intelligent^ or mental alertness, f2.) genera 
cultural background k. famdmriry with well-known works 
in the fine arts and with important individuals associated with 
the (me am* O'l attitudes toward scholastic activities, and (4) 
background (achievement 1 m the bade lienees Knur test units 
were employed to measure the**? lour lijqmthcsiacd functions 
—CMC fevt for ra»h fiimtion 

XttJtrtr a/ Me fni Vmfs 

TJic [can were grouped into two »rc turns with a general 
sheet of directions for the examiners The fust section was a 
battery of four baste-alert nw or gcncrabibiwification sub- 
teats, each of winch was administered on a strict nmc-limit 
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basis The second section consisted of three separate tests ad¬ 
ministered as amount-limit instruments Descriptions in slightly 
marc detail may lie given as follows u * 

Section One (Korin \) Test I consists of f om sub-tests 
Sub-Test 1 is a revised form of an arithmetic test of 10 scaled 
items 01 iginally used in the Army Alpha Sub-Test 2 is a re¬ 
vised Joiin of an opposites test consisting of 20 scaled items 
originally used m one foim of the Aimy Alpha Sub-Test 3 is 
a revised foim of a lubc-umntmg test consisting of 8 scaled 
items used m the Army Beta Sub-Test 4 consists of 12 items 
selected fiom Canfield’s (11) lest, which has been used as an 
optomctiic entrance examination The working time of this 
section was twelve and one-half minutes The first three of these 
sub-tests have been described as a bntteiy by Rostion and 
Lauer (7), and have been used in printed form for industrial 
applications 

Section 7(40 (borm A) Test II is a foim of cultural inventory 
stanthudi/ed by I auer and described in a previous publication 
(3) Originally admimsLeicd as a sevcn-answei multiple-choice 
type ol lest having an estimated idiability of +94, it was 
revised in be usnl as a live answer multiple-choice test in con¬ 
junction with standard IBM storing forms 
Test III is a test 11I scholastic attitudes covering the reaction 
to such items as hbr.tiy study, class attendance, and laboutory 
woik, assumed in contribute toward success in college The 
genci.il fmm ol test has been described by Lauer (4), 

Test IV is similar to Test II but covers biological material, 
The original form was used by Schweet and Lauer (8) It was 
also revised to Ik* used ns a five answer test with standard 
IBM scoring sheets 

The approximate working time required for Section Two is 
thirty minutes, although all examinees were allowed to finish 
the test In order that the testing period might not be unduly 
prolonged, some verbal incentive such as “get through as 
quickly ax |wmsible ( ” or n similar suggestion, was made 

jjfhitniisfirition a mi Scoring oj 'tests 

Most of the tests were administered by the senior author 
during the Spring Semester of the 1947-1948 academic year, 
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AUmt prJ ‘mi were iPivrft nn tiler wine day in two successive 
perils I Ik other m jvr icmt were pivcn under authorized and 
Mpabto -jjjprrvp^mf cither a \ the College nr by qualified person¬ 
nel dm^i^hnui the tmintry 

The* SMHrry nf iciit mimeographed to be used with 
•standard IBM rlrruii ■nming ahect* M mentioned previously, 
thr amb ir»h nl He*, linn 1 were ilo^dy timed A prion scoring 
formulas fflrtr linos employed in mrrwi for thancc successes 
cm multiple «;hour item®* For ilnr Iasi three units the amount- 
limit method uf admwistration wa<# uwl No negative weights 
were awa^nrd m any rcijmniicft m the attitude test but n system 
of retertr pmmvc weight* was nmd for the negative items, 
m which high values wrre added a* tow values, ami vice versa, 
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Reliability cncffiueiit* of the various uniH of the test battery 
were trauma red from uirrchuton*t of the scores of nil applicants 
taking the twin Thr split half method wa* used in which the 
odd numbered and even numbered items attempted were cor¬ 
related, fallowed by the application of rhe Brown-Spearman 
formula The estimated reliabilities of the tat units are pre¬ 
sented in Table i- 

A reliability wdJkieni was similarly estimated from correla¬ 
tion.^ of mwum within the criterion (first semester grades 
in the professional optometry course) Initially, two sets of 
amnws were picked at tomtom welt that half the courses were 
in one «t and half in flit? second set, No systematic combina¬ 
tions of highly related content courses resulted- The product- 
moment correlation obtained between grade averages in the 
two isets of courses after correction far length was 88 (N ** t^), 
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whicli is probably to some extent spurious. The magnitude of 
the coefficient may have been due, in part, to the r 61 e of a possi¬ 
ble "halo effect” in the assigning of marks in view of the small 
number of instructors and the degree of unintentional coopera¬ 
tion among tltcm. 

There were 133 subjects for whom data weie available for 
each lest variable 1 he pioduct-nioment intcrcorrelations of the 
tests and criterion arc presented in Tabic 1, The multiple-cor¬ 
relation coefficient was calculated by Wallace and Snedecor’s 
version (10) of the Doolittle method. In addition to this coeffi¬ 
cient, the beta weights and the contributions of each test to the 
total predicted vaiiancc are presented in Table 3. 


TAW E 1 

InftrrorrtlnUM J 0/ Tali (/, 4) mid Criterion (j) N => 133 
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318 
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'33 49 
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5 73 
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18 72 

9 7+ 


48 T 
3^8 

271 

278 

M °4 
11 58 


a 

.1 

4 

5 

Menus 

Standard dcvialiom? 

See table t 

When the relative degree of homogeneity of the restricted 
group was considered, an obtained coefficient of multiple cor¬ 
relation of 56 appealed to be substantial. The standard devia¬ 
tions of stores on the vanons test units for the selected group 
of 1 13 students were from three-fourths to four-fifths the size 
of those obtained for the total (unrestricted) group ot 229 ap¬ 
plicants On the assumption that the standard deviation of the 
composite score of the total number of applicants taking the 
tests (N 219) would have been about 25 per cent larger than 
that of the restated group (N = 133). the magnitude of the 
piobablc correlation between the composite score and the cri¬ 
terion grade average of the total gioup (had a 1 applicants been 
admitted to training) might be estimated to be in the vicinity 

of -I 65. 1 

1 An .mpavtd aimi'ii of wh<u ^ 

would have been for ihe loial snmplc ^ test variables, upon which restriction 
ilie iinrcitrictcd grnup between each of the four test variables, upon 
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) ^4»mvA*H*n nf Table j reveals char fhc test in Alertness 
tfirntiul Inir}hgr#i<c& sun >te far above the others in predictive 
v jdnii? NV*( in impuritflcr n she Ccisl of Scholaslte Attitudes, 
The *nH*,mugn gained bv thn imt are iut low correlation with 
fcrnih th<* inielhgviKe Jr^r and the other lesu ami ns significant 
rutff4?Uf^*n with die cmerino 

.Such a finding wggeH* the [#»nble value of tents of this type 
m acUtIiwi procedures 1J dimihl* however, be tried with new 
'ample* to (kieftmne whether it will consistently contribute 
lo the prediction of ataderme auetc** Further refinements of 
the lent, including new Hem analyst and the possible addition 
of mher Heim, are lacing undertaken with the view of increas¬ 
ing Hn validity 

The tumnbtiiion r»f ihc Cultural inventory to the total pre¬ 


law f i 

\f* bfdf /-W.? 

fwl n-istfAV ' ' 3 

UH* »" " 4 ^ 

^ Ipw !-<**4 ITSl . ^S 8 

ill v«t •*» \*> fTi *$* ** C|« ?<rstrrc!fj hi UionkatfO 


4 




-ojifi 

;Ol^ 


dieted viirwwe ua* siitauiMia) In view of Us high correla¬ 
tion wish (be tet! of Afaittm* k would not l>c expected, how¬ 
ever, so make a Urge unique contribution to the multiple 
correlation, 

The test in Bwfa^htA /btrtjf wuhA failed to add much to the 
total predated variance Sm<c nuwt of the work in the First 
Semester lends to he more clrwely allied to background in 
maih«maiH« and the physical sucmea» this test may show its 
value in subsequent semesters 


6YwfW Summary iw,i Cttminmm 
A balterv of four teat unit* was assembled for the evaluation 
of opmmeiric aptitude, Of the 119 applicants w whom the 
battery vw administered, ijj entered and completed the first 
semes re A work m the 1 -os Angelc* College td Optometry^ 
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In terms of the results obtained from a correlational analysis 
of the data for the sample studied, the following conclusions 
may be drawn 

1 The best single test used, for prognosticating aptitude in 
professional optometnc training was one of general intelligence 
Although the machine-scored technique appeared to lower the 
reliability of the test somewhat and although the group selected 
tended to be relatively homogeneous in ability, a correlation of 
,48 of the test with the criterion was obtained. 

a. The second-best test used in the study was one in scholastic 
attitudes The inclination to study, as measured by the test, 
did not seem to be significantly correlated with intelligence 
Such a finding should be further explored, 

3, Cultural knowledge was quite strongly correlated with 
grades, but partly because of its being closely associated with 
intelligence 

4. Biological background itself did not seem appreciably to 
be associated with success in optometnc training during the 
First Semester. This may have been due to the fact that this 
semester is weighted heavily with materials of a mathematical 
and physical nature 

5 Hie results obtained would seem to warrant the advis¬ 
ability of submitting every preoptometry student to a similar 
battery for at least two purposes: (a) To aid in developing the 
best procurement technique possible, (b) To assist in assigning 
responsibility for good work to individual students at the begin¬ 
ning of the professional course in optometry, 

Suggestions for Future Research 

As to the improvement of the over-all validity of a test 
battery for selection of students for optometnc training, addi¬ 
tional tests might be included advantageously Tests which 
measure the psychological traits of spatial relations, visualiza¬ 
tion, and perceptual speed might possess differential validity 
(5,6). A pichmmnry job analysis of activities involved in var¬ 
ious Inborntory subjects of the curriculum has indicated such 
a possibility A biographical data blank might also be of con¬ 
siderable value to over-all prediction. If a valid scoring key could 
be achieved. 
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'I HI 1 , 1'M‘K r 01' CLIP NT PARTICIPATION IN 
1 I'M' INTERPRETATION 

1‘AUl. 1. DKISSK AND ROSS W MAHESON 
Michigan Stale College 

The Problem 

Tm role nl in ihe counseling process is as vaued as the 
viewpoints about counseling At the present time, test ad- 
minisii«itnm and interpretation practices generally suggest a 
latliei directive, authoritarian type of counseling The counse¬ 
lor, m efleet, says, “If you will unquestiomngly fill out the 
forms and take all the tests 1 presciibe, 1 will be able to tell 
you what to do M The inability of counselors to break away 
fiont this concept of tests is piolmbly one basic reason for the 
dismeeresi in tests commonly exhibited by the Rogermn ad- 
heieitls At the verbal level, at least, few will quarrel with the 
idea that immselmg should seek lor the development of the 
client's self undei stand jnjt, self-acceptance, and self-sufficiency, 
always with due icgaid for his social responsibility Acceptance 
of this viewpoint involves the obligation of determining 
whether, ami how, tests contribute to this development, Our 
attention then Incuses more on the client’s impressions and 
reactions than on the test data. 

Test Intel pi elation Practices 

A review of counselor procedure m test interpolation at the 
Michigan .State College Counseling Center indicated wide varia¬ 
tion in practice. Individual convselois clamed to vary their test 
tn to pi elation piocedme gently in tarns oj die needs and per¬ 
sonality oj the individual client. Most counselors felt that the 
antiic staft functioned in about the same way in dealing with 
this aspect of counseling Actual practices may be listed as 
follows, 

I. Counselor gives an interpi elation directly from test data The 
client may see the data, but it has little or no meaning to him 
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2 fawtubj ntfi a if ft pi c,filr, bus umti it only a* a visual aid 
on hr* ih°< on^i r *n 

? r ,\,}<wfv r fttf T a -cry gamdl fumimtfv of ihc test results 
^nd rmpb-cHr^ lh** fiiip)riati«n« Mllirr than the test data, 

4 f'awvthr rpjru,«*n» client and wget that! tv tom watt and 
qunihon a\ td! MJjtr* «M mtcJpKt-Hson 

<■ After « hod introdii* lion to the profile, the chatt u 
allwceJ /x? * twmrnf, arid Jaiu jm tff mli The counselor 

cmu rniwir^ nit mnnnuinwg the * hem's firm of thought 
TJuw pratMm differ brgrjy on the amount of client purtict- 
patiori mvoSvrd- varying (mm a enumeht lecture to a 

thru! k»s ifnfutitutr Several aflntav on the pail few years have 
drall with waller* rifely related to this 

Rogers $'4) hat stated that the client centered counselor 
makei lew sw of inh and wm them in .1 different fashion 
“Thev fpiy< h^mtt fit tniM do ttni Mand up well as a technique 
fnr ilwtm rentrraS ».^»nwhng ^ Uc indnatrs, however, that 
ta&t* may He introduced m she dirrn's request, hut that even 
thefl the foemof imunitrling rcitum* on the emotional attitudes 
cjrpre^sttl Wlmittmg the value of te+ts iji selection and jn 
rwariji, Jfagm huM* that, cwiiwlly, test** initiated by the 
touAwtar hinder ihr n*nfl*eJin# pr^cM whom* purpose is to 
rttauc growth form 

Bonlifft and JWcr fal cfotribc an interview' procedure 
wherein the proof's id test wire toon serves to enable the client 
better to underhand the ^igmfkarur of materials related to 
hi* twit feelmgt and to facilitate the development of a deeper 
uruterstanding of the problem J hey h* j&othcwsre, further, that 
this rest selei Mon procedure server m motivation fnr the testing 
program it^Jf ami that it frnicrs the diem's recognition of his 
own mpunwhiliiies in she cmmwling prmew. These writers 
suggest needed meardi mu-diot involving the effect of clients 1 
participation on their aucpiam..c of the test results, their as* 
stutnpMon of responsibility for wiving their problems, and their 
armudn toward taking tests and toward the counseling process 
afimvmb In a study of the kinds «d tmt chokes clients make 
and their behavior during the mt wleviicm proms, Scernan 
(6) had fifty electrically recorded interviews cla^ifictl by quali¬ 
fied judges as ta die drvismn of «,uimvbr^cliem responsilnlity. 
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The expcrimcfH.il 'minwlMM wrre found to be consistent in 
their use of Hie diem «**lf K'lcuion method of choosing tests 
It was found that therm ^elcs tests available for prediction 

in 9J 1 F r tcnt l ' 1 " t,1r l Kmi1 ^ i'-^S and that their reactions ’ 
to and during the prrww varied 
llixler and Hixlrr fil dincum categories nf teat-interpretation 
technique involving v,u) mg degrees of counselor opinion. They 

conclude that ( ouiHclors *»hmdd" 

1 Give the diem *impjr mimical predictions liaised u|wn 
the teat data a AShw die chew to evaluate the prediction 
tin it applies im himself } Remain neutral toward test data 
and the clients mtur* 4 Kmlhau; the client's sdr-cvnlu- 
aiipn and sulmqurm demsp-ns hy the uuc of therapeutic pro¬ 
cedures 5 Avoid pmsM*ivc method* Test data sliould pro¬ 
vide moiivaJMrt not thr tmiwicW 

A study of ffhe relationship of the amount of counsclee talk 
to ihc cffeaivcne^ of tmmtding w reported by Carnes and 
Robinvm fjH Analyzing typmrqm of 78 interviews, they 
found a wide range in Hie amount of client talk and growth 
in client imight, with the topic of the unit influencing the 
relationship The writers conclude tliat 1 since causal relation¬ 
ships arc not dear. "it is not possible to use the amount of 
client talk M a entenon of cminwhng effectiveness " 

It will l*e *sr*m that none of [he above references seems to 
have been tuntrrncd with rxatily the problem discussed here. 

a tftr Uypvthfiu for /mwhgation , 

The muw Mitril in the preceding discussion were formulated 
into three major hypothese* for investigation. 

Hypothec I Client* who participate more actively in the 
test-in terpreu turn prrww gam more in self-understanding than 
do thow who participate lem 

Hypntliew II Client* who participate more actively in the 
testdnierprruiinn procrM are more certain of their final vo¬ 
cational choice than are dime who participate less. 

llypttihett* III Clients who participate more actively in the 
test imerprciauon pjmim are more satisfied with the experi¬ 
ence than are theme who participate lem 
It ms apparent that these hypotheses do not cover all the 
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-iT^inS Inj’Vlusr „ urrrjiif^ *i«!?u;rndv % <'im|‘«lf<4(«*J for,in initial 

t ))Vf 'ill?* ll.WJ 

fifithipwitm *f f nfwiri JiJtf hitfluaint ImhummU 

Yin *■ f p sfiH opalmii" mutt He preceded 

I?! a f hfjtu Jfu-n tiiJ die phrase } Im MJowjng Imt nT principles 
»fr*chij'nd and the***, W the purp^w id tins study, define 

dwM JVJffUMj'MfHsJI 

\ I Vi no! ilr*i um test *n ■nj-fw tin id M h i frar iluu the client 
Uj nothing mw im|^runr i*m hurl I h>r discussion and 
Ah) r, rmuimmaSSy ip-ady to 4r4 wadi diem 

£ -\ vi>? prcgfjjc mu dfv fw u*r4 a% die most meaning* 
frd amf wtipfrwj form far int rcmli* ln.i client. Ability, 
adurvewirnid mitcrr-ii* and ajniiwulv •Horew way readily be 
pU%e4 r*n wni' JAuvmdfty ruling* must frequently be 

handiM ^rpjnt^tflv. pardi Itfuin' 1 *4 ihor injure and jwtly 
faHamr dir drui-abibu ?4 n difSWrndy interpreted. 

\ t*xt»Hin the pviirtal b-mn **\ ilvr profile cmpluuimng the 
»llllnre»|l ipMfa-e 1 * mr .miffed muI tlsr iomp.intmt r*l the indi¬ 
vidual with other utrlrvrdui ds 

4 ,Wm4 mkimi or diriuMswg irdinu-i! and nUlishcal details 
as ipikIi 4 i j^wjldr 

i Imu Mirage die tlrcns lu \*mr Iih «*wn ex plan at ions, 
huiitHetj and Jec-lm^* 

Ci \iH%rr jltr tlnmfNi l»ur ifri iu*t jwrmif miV)rm,i- 

n«*n Itmiiic rr*mlt m (hr eituiiicm it uinicm of the 

qtifsimn* 

tire ^ben*^ frdsn^s and imKeins hut do not 
tdfrr Kinmldimn ruept as at is uiudrUtd l«* supplying addi- 
linnal fasts fur uifrirpreiati**ii AH«\r all, d*i noi get involved 
m dkfrmSmn ihv ivsis ««r m smpr^^mg on Hip client results 
not yrr Minqitatde Ui loin 

B Allow die ihern adrtpsatr nine to reaur to the individual 
element* «f the profile lit# nni rudi linn into fonnultuion of 
emu lu sum a 
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9 Enuiuragc the client to relate his own expediences and 
other known facts to the test results 

10, Do not, by silence or otherwise, seem to give tacit ap¬ 
proval to statements which are erroneous, but do not be too 
lusty in turret imp them 

11 If the ilient ws only one course of action implied by 
the test results, do not at once suggest others, but do not 
permit him u> judge by default that you concur. 

\i The counselor should suggest the advisability of addh 
ticmal information, if the client does not himself sense the need 
In general, it is better that occupational information be in¬ 
troduced on request 

13 Test scores should not be too heavily emphasized since 
thev arv probably less important than a multiplicity of other 
factors in determining the client’s final course of action 
'lhc Mineral impmt tif these principles is that the client is 
to be given lhc opportunity to ask questions, venture his own 
hum lies and, m short, to develop the counseling session in the 


direction of Ins own interests and concerns If he leads it away 
from test results it is assumed that there is greater value in 
following his stream ol thought than in leading him back to 
the test results Although these principles have been stated as 
imperatives, it is to he kept in mind that their appropriateness 
in test interpretation is really the hypothesis to be investigated 
F01 ilu* puijvmcs of the study the counseling sessions were 
now to be rated m terms of the extent to which these principles 
were followed, so chat it became necessary to develop from 
ihcm a racing scale for use by judges, After several trials the 
scale show n in (■ igurc I was agreed upon as the basic instrument 
for this inii]wise While not entirely satisfactory to the judges, 
it wtw sufficiently dear that on several trials the independent 
iutlumcius of mtci views gave very close agreement on the 
separate items as well ns on the total scores The ruling scale 
wis mu slim, n 10 (he counselors involved m the study 

li \y,,s aim necessary to develop a test of self-understanding 
in order to determine tire actual increase m tin client s sel - 
knowledge of qualities tested Thts test: me W* que n 
tcMinit the client's understanding of the level of his various 
abilities m relationship to norms established for vatious group 
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Appended la fhii for convenience in admin^trAHon were quc«- 
tfcmsi designed to ^certain feelrnjp nf v»ciitionA] secunty 
m well sm the extent of Kit f*(i$faahn wril> the aninseling 
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received I'l.csc parts of the test arc handled separately 

“ c anil >‘'"’ "« lr W reproduced here because 

of the sjiiice it unuld require 

The weevilly for using a common set of questions imposed 
the necessity or using ,t group of students exposed to tile same 
testing ex|wricit.cs This, in turn, imposed the necessity of 
some similarity in ihc nature of the clients involved in the 
investigation Ii «,is agreed that no preference freshmen (those 
entering college without n declared major) would be used as 
subjecis The use of a common test hattcry with this group 
was appropriate for initial exploration of vocational choice 
After the experimental interviews counselors were free to use 
additional l cst* il ds^sred 

For the pur|w of this study the uniform test battery in- 
eluded 

i J C h /'(tv* t'Xfsmimifott 

i Cmprriit;u RtoJtng />;/. f Higher level). 

j Mmmrwta Papn purm Ihttn! 

4 firmfutlrf tlfriw j / ‘frit 

5 hud ft i W/Wvm r AV< wA 

fi Cali/w r/f.j Jut t) f Pn wwuhty, 

A rq^sal lc%i |*m»dr ifwrr w a u prepared for use of counselors, 
bur it m itM ftcnw unique rind hence is not included here. 

J y bnptrmtntal /Vjfjp; 

F Ihv dure involved the following steps’ 

i l)rvrhq»wm d ihc nuteri^U prescind or described 
above, 

i, Hrlnsmn «d ,i nwiilicr «f <wm»elani interested and willing 
to tfMsper.ar in rhr n-udv tOrggiiwlly it was intended that 
seven nuniM handle wv<?n thents each, but illness, 

lca\r» *d alt-rtuf. and wilier unespeued factors resulted in 
unequal M.mdw'p* rd tlor-nH lor rath rnunaelor} 

J Vlr+h^Ai 1*v i\\f %<*qnrMtmg tounsclart of the subjects 
for tin* Mini* I he * henr-s «illinium on cooperate and to have 
the imemrw re^rded dwrnimed before including him 
in the *rudv 

Si 

4 AdmiimtMit&n r»f rh<* Ttft $f$J*Uittitrjtan<lh?g to each 
client lie ft me ultwg fbr mi<t ssf rhe tounwling battery. 



**tir fmrumvu mi jtiHm+scsi au'kement 
C on »'if *hr simlonn ?esf liAUrrv 

a 

^ ro »*i *h<* UA rrswlH imd<rr she same physical 

rojufoi *'p 1 * a wiw m-ordiflit* made *■') ific entire interview 
K It) Jour judges nf thr rccarclcd m* 

l-rr^ 

& Hr AjfmfimiMAt.'on ®“f (be TV// V »V/jf i wiitristtitiditig mu] 
r^wpSr!^Ji «d the Apjwrndrd dealing with satisfaction 

and v * ^omrsH 

»i H'rp-rf.Tdf ji imui motif hs bitef Mi ibr Ir^j and fpiesucmnairc 
1*1 *Jrirtj^?nr rricuison and loniunird r -aMlai imrt and security 
jr ^ufniiu 4 anali, vs nfih/ing Anil)*ii. nl variance inti co- 

vairniwr (h* de-mumme 4 k r^icni .and ruhirr nf the venations, 
d ferre c^n* a Jr** (o4k? pnint* * i pro* edu?r nor adequately 
sunrfril 4*01 r * bwortSkdiF* nl whoti e* nneveary for lol- 
bnatafj^ *c|q«r rr^t 4 flu** reborn ffoc b»rM. * benis involved were 
w|I fre durirrti >au*Sii ibr’ wiik’ faird problem*, ihei look (lie same 
sr^f-T. and fwd ah inJetprri inimjrw t»| apprimmately 
lotjy her 4<" i^amiie m bnuih NMide iyt\ *c% ff hcIiqIajhic 

J|I*1 fl ulbr* Suf-M* Aft? HHa oJHJ t dbd ihtrr WAS no CVI« 

dnur r»if ;uji pa+4bn hi no ilj miIki lan tmoi I hr 7 ni fif Self- 

f f)Jfr,ivritJ$ug wtAi vMtrit hi the iIkm^ responses 

with (es! rrwDi'- 1 Ik imw* Jiffen-jnc l*r(wren pre* and 
pal were os-riel jo mrinorr sV invtrair in self 11 ruler- 
M Jitditi# 

I hr rAim^oi <4 ftjriKiilir wiri'v kw- I»v a iuJ^c was obtained 
by (ojuSDoJu* iht om irouit rd (be dfNUjpiEons i.becked 

by 4 k Hjh I'rrmflUrd a iiutmiuint of (fifty jwunH (up 

l<i ihsr Irfit r lib of ra^Su iirms 1 ! I „»sJl pul^C w is allowed 
up to 1 j Hafirs ts 1 ! br tiM'd pi an nsierwcw scenieil 

better in resj<ci i** iiuddrts pardtijMiiun ill in (hr ntaiulard 
wt4e jndstamS Tiieive rsira poauH werr Imilr used and the 
’Hale muMinrd c wnmlSy a Imrh j^*sni ran^e 

A fait Htt *tl 7 V t<3) mmi 

,As a prebitiiiiiMFV ifep ilir raian^ *ii.dc phcIK was analyzed 
by (abulatin^ ibe indnidiMl rAtimjs «l ruh r»f ibu judges for 
each ipuervicw on tudi nf ibe Hems covered This aiulyfiia, 
wdien utmjMrrd wpib lorrc^feondiijy ^ nus mdeiKed in clients 
^ciburwlcr^ianding, shmted ihai ibe order til llie numbered 
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responses corrcsjwndcd to larger gains In self-understanding 
This was not regarded as a validation but os a check on the 
consistency of the weighting of the item alternatives and also 
as an indication that some relationship existed between par¬ 
ticipation and gam 

The individual ratings of the four judges varied for the forty 
interviews from a low of nineteen points to a maximum of 
forty While differences were found among the Judges, the 
agreement was very ilose When the individual judge's rating 
of u particular interview was compared with the mean for the 
fom judges, one half of the ratings were found to be within one 
point of tiie mem for that particular client, Application of 
analysis of variance to the ratings gave the results shown in 
Table I. While the variation among raters for the same coun- 


TAB1J i 

ftidhiu e/ / Apfilttd to Ratings 




Sum o( 

Moan 


r VjlM* vi \ 
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Square 
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TuijU 

( F nunk' r l'' 4 * 

IS 9 
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1S81 oo 

313 

70 33* 

Miitlrfll* Uif MW*- «&!« rW 

» 

S J l g 

26 78 

6 00* 

Hair! !»*? vamp piMlVtd’W 

il 

’3 74 

4 40 

3 08* 

Kcfruimlo 

W 

^r,70 


ai iHu i% level 


bdor was highly significant, its magnitude was less than that 
among students for the same counselor and much less than 
that among uniMclorv. Differences of opinion concerning one 
of the num'iewA of one counselor accounted for the significance 
of the difference among raters, It was considered that the 
uniformity for rating was such that the sum of the four ratings 
would adequately characterize the participation factor of an 
interview 

Analysis of variance applied to raw gains in self-understand¬ 
ing (difference between pre- and post-test scores) showed highly 
significant differences m the gains made by the clients handled 
by different counselors. Having, also, found highly significant 
differences among counselors in the extent of client participa¬ 
tion, an analysis of covariance was made to ascertain to w a 
extern and in what way the two factors might be related, 
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M in 5 1 'BrC’ frfnJitwrrt sb-M * Jtnn |\»nsojk*Siirin dhtrs account for 
a jwtxw <4 shr ihffrrru* t*i garni! fifnrfi e 4 Ths*i conclusion 
r«p!Uft (rwn the fad ihai although tin* gam a differed algnift- 
*amiil7 it&Tti u« *mjiwW A Mir adjustment of these by 

M*r wit (He farfJtifMijoM md^t rmilt^ on nun significance, 
bwrthd oJ ihr covariance analyse involving 

te-Mn *4 ffar 'Hgoiifk aoiic nf ih** ’*withou M rtgresdon, of the 
"'amun^'" wfticwsnn,. <| f didWeiue^ between the regressions, 
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and nf site M<wsj>widmg Luffd'Himi* suggested the following 
addafi^flaJ emt rr jofel a < s* *rt i 1 

0 Thc*ii hm -rwrt*!»n f< ko} r^Tkl .iKOfnpjiiying correla* 

(ton ( /i) were nut 'ugwheartt 

2 While the* rdatmiHhojn at fMrtMpatmn *<> gam for the 
vgnwu'h individual emitted* i^firnl greatly, the results were 
not ntajmtk db 4 ignilfk<mf berau^* uf the small number of 
fiiw mvdved for each cmmtdor W»r example, for one conn* 
&elwr she cwfdatkm between gam and |tanici]MNon was 4* 66 
wild# far another it w/ia - 

3 , The "aiwwfc** mew rrgmwm 1^771 add correlation 
f K 4 I were fciuiul to W ’MgmfudW 

4 , The gain of individual* Mugml i** a given counselor 
follow a different trend than the mean gamin for all clients of 

*JB!WW -rrsl/vrt^ 13 

* THkfl dwahm ami Mw f^'4-wH mr 4 n**lvr ita? pw^fc?* wilawdl lip Saedew 
8a fete ffiMutkjtf A fatMt, u %~$n, 
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a cmiii^clor In lax. t, the counselor means have a definite trend, 
the individual do not 

The finding of tins [Portion of the statistical analysis Indicate 
that the differentc< in the gains in self-understanding made by 
j lc m s aligned to different counselors arc correlated with the 
differences in the amount of client participation attained by 
u ,unwlor While each counselor did vary from client to 
client m die .muumt of participation, this variation was small 
compared t" the differences among counselors and, furthermore, 
it did not slioft any consistent relationship with the client’s 
»ain m itelf understanding 

b A similar statistical analysis was made, using gaihs from 
protests to ie*f* given two months after counseling, These 
italic were cmtHiderttl as of more permanent nature* All signi¬ 
ficance ratio-* were found to be in the same relationship as 
thoMi uimpulcd in the first analysis, using the gains immedi¬ 
ately after couirtd’Hig, but they all missed significance by vary¬ 
ing amount* The correlation between counselor means on gam 
and pamupaurm dropped to 6 $ and was below the 5 per cent 

level of mumfii Alice 

One question in the fumitory of Stlj-Vnderstmdtng asked 
the student in indicate hit degree of vocational certainty as. 
confined, a bit uncertain, fairly certain, certain and secure 
These reqmnsc* were scored as o, I, a, j, and gains in certainty 
coni|>,1 led Itciween pre-tests and each ->f the post-test, The 
participation rulings and these gains were then 
tovaiwnic analysis in a manner similar to that mentioned 
previously Non significant ratios, regressions, 
were found for gains from pre-test to testing immediately alter 
counseling/ The correlation between the mean particip 
ratings of the counselors and the gain ' n “?" ty ^“ ev / r ’ 
which is not significant for fivc de ® f ®“.° ^ mont h 9 
when gains from pre-test to final testing found 

counseling were used, the corresponding «* ■ 
to be «a. which is significant at the 5 f«^ 1 nli was 

The amount of satisfaction J* hu . 

sampled by several questions winch were c d 

mental scores- This reset. on obv J “ r s w r e ob- 
teriria «l gain from pre-test, but sattstaction 
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If rWfr/fW 

\lH i-mTU(f-ni hr # * t^ri^ fjir'r (IMS's! h r tuft^lrjcrcil a*? 
tCit'MIjie ?«0 Jin*r t>? (rise <fljj!lJ miimher * ,m!I t <jj* 15 !% arSfl i^HinvctoM 
grni?hc<IL if■!? dir j* "ms i,wlrdj*rJ «. ffimlntV *4 fionu* of 

nine oirnuiMiff^r^t'i nfj-nJ h ^ tHsm j*i I**’ kept jji m if til Hut the 
UKj^ mj itoJifU^ iruliTfi trill J(rj df at ill * 4 HAtftwUl 1 & SVAS 

jimmi mSsueh **( she ii^juS tijr In hill riiMircflrM ol 
lSi|t e And tr*rlhrf HK^JlSvJlr^VJ rae i rf<d umdu^mH 

vtliuli i*r lu*^ «hf|l tw * j*tu alU ?*< 4iumol hv <<dtm jiul imictt 

m J .1 l»| 

1 

5 tj»«p It ^J&odtr vis Hint Minimi’* wJoo jurlHipitr i limit pm 
wwt nr* ffcarll «ft* I rim i indffty ira he noiihrmttl, but in a 

%*jTnew , l'M! n|jlhirr»if dun rufliriitcil ;\m mibiuhui utun* 
tcW ^ jfrnm diMit ifri thrni ?n ihr Minimi mI )urtiu|vuUin 
sluited hut iheujr \4futitmn-i perils h» luxe* hdile fit nr* rehition- 
fthip iitill (hr i!mjia?o ffi w|i ipsidfr^Mitilfn^ l cuifi^elnnj 

v.iry ^tTjth jisr^fti. 4 ihrtnwhr% nrt she jfnesunt *4 theni |*Jir» 
riii|uii(Mi rhufr*! 4<ul dir mrjw ^mih< m ^ll mnlc^hnuiing 
rn^itlr hy iliuir clicfit^ iH|»(Wdr in* J*r rjiher * h**rly TcUttd to 
she rocjn elirni |Mrtou|unon tn4r\ lh»^ f tr& in RjKi^king 
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of client inUrrnl cminwliiii!, j^ununut tlut M is mare a matter 
0 f "aturmlr 1 rh-451 nf "Hirrlud H " Ur contends tliat the client 
is apt u» rh it ilm n«m*Hnr is using a method or in¬ 

tellectually Uiwn t™d apwl hm\ 4 act nrdingly, not react well 
to it Perhaps she mdavidu.it ouuvwdmV variations 111 client 
pariKipat» r *»» ,»rr tn the umc < Ijv. Another possibility is that 
variations Irmii thr mumdnr\ general level nf client partici¬ 
pation are fnnrd on the toimselor by rhe jdimyiicrnsie^ of the 
client 

1 'I hr hyi'-olhri.is that studenH who participate most are 
more MHiirc m thrir u* ui»*n.it dunce v\ given some confirma¬ 
tion by the d.ui I’vuirmr immediately after counseling is 
favorable to this hypoylir*^ and Itruh practical and statistical 
ssgmlk.iwc jrc .mtchrd in thr fact that after a lapse of two 
months she rcUfnmdiop t\ e\cn greater If tins should be the 
u*c generally it wmuM Mipgr^t tint high client-participation 
touii f *clor i * airc more *mtrvdul m stimulating client growth 
than are mlirr onimrlnn 

\ 'I hr It) |* 4 ihr*in ilui siiidrnt-s who participate most are 
more vih<»hrd u imi omhrmcd by the data Neither is there 
aft) rv (.drill r *»S Jr”„ MJKil j* tjrm 
'I hr rrlasnmdnp <4 * unla* imn m gam in wdf-uiulemnndmg 
ami in gum m fading »d imaiinn.il ncuiniy was of the same 
order a * slut woih k brm jMnitppatmn The implication is that 
client *> 11H n ttnii vk not a very direct or reliable indicator of 
(hr dllri. tiv rnr^s mf tmin*diMg 
M dire id) indccatrd, tbew findings are presented with the 
fading, that the ivwcn raided and the highly tentative con- 
duMoiv. re ,h bed ibme far more extensive study 
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AI M’-RT EJJ4S 

llh* I trflw, Mfiib Park, N J 

Whiif if n a c^mmnn procedure for college (and other) 
intunirtrtr* *« have thesr fttudrnu murk their own objective' 
iy]Hr rxamuutoim papery the marking by students of their 
own mi) type examinAtiont w a much more unusual pro- 
eedurc l he prmm author, having a Email enrollment in his 
Kutgcrt Vmvmiiy <hw in (rmral Psychology, decided to 
expcninrsn with me form of &df marking essay-type examma- 
turn *1 he remits of thin experiment are reported below, 

The protfthire of die experiment was as follows Eleven 
membm C»f A 11 -m sn General Psychology were given a surprise 
rumination of the regular may type during one of the class 
K*uuii<t There were five may questions on the examination, 
four nf *hith were avngncd a maximum score of five [joints, 
and nn? of *hrtli (yucsrwn No. j) was assigned a maximum 
score «| ren pomi* 

The papm were coll&tttl immediately after the thirty-min¬ 
ute rt*jiminatK»n period had ended, and each paper was tmony- 
mwidy -mspurd a %mk number by the instructor. Then, one 
by nnr, the exjininatton questions were discussed by the in- 
striawr and the members uf the dais, and the correct or ideal 
ansiim ui rah question were agreed upon, After the correct 
anwrr in eath question wm brought out m the course of the 
daw<wm diHii^aon, the Mudciifca" individual written answers 
i<* i\m qu^-iion were read aloud by the instructor to die class, 
and each fttiidcnc w asked to rate each answer except his 
tm n “1hu%, for each of the five eatay-type questions read aloud 
to the ten ratings were given by die class members—none 
nf whom irj»tdd presumably identify any of the answers except 

s I few kM m by Ft&kri Mi 8«cHl#ri n> whom lha author 


hu cAtimMi, and ftvi iimorotAi. mfa’jvrfmfnt 

hn* own Ai ihe y;tic rime, the instructor ,i |m rated each 
wwa 

When the <"> indiv iidual ratings go the guisrmns ften students 
e«rh rasing ft\ repast ions) had Iwn ihm obtained, Hie students 
were a^knl 5 m turn m thr?r rating <vhectt, with their signatures 
<w each *h«t 'Hie mean rating given by the students to each 
cpiemdoti answered hy rvmv other member of tile dims was 
then calculated and iMtijurnd ro the rating given to the same 
r|uestir#n hy the mtirmr^ A comparison of the mean ratings 
of the dcvcri siodrinit and ihr ratings nf she instructor to the 
«Amr epical mws awutrred hv idem it al Mndnm is shown m 
Table 3 

Worn An minim mwi rd I aide s she following observations 
may l*c »wdo 

i In iiwi junta tuc^ the mean rating of she students was 
remark tihh ib«if to the fating gam hy the instructor to a 
given <|u^non answered h> a certain Mudmt 

7 1 he d&ffcrcm* ^ between the me w rating, of the students 
and the iratm*n were almost never very umsUtent. 

In the ta%c of the rscanwi ui«n p-ipcr m f student No i in "I able 
1, the ^indent”* Kons^lnstlv rated the answers to this |*a|*er a 
little higher than did the imtimtm Hui in all the other cases, 
no *wh bias i* apparent 

j CuwmWjng Hie Mudrntft' and ihe nnlnictm’s ratings m 
all the pupils" answers t<* cat'h ol the live questions, a would 
appear that m (Juntmris t, 4, and ^ the two vets ol ratings 
were remarkably similar, whdr m yursfinrt i there was a 
fairly consistent tendcsky of ibe students to give higher ratings 
than the instructor, and in Question .! a fairly consistent tench 
m> of the undent* to give lower ratings than the instructor* 
Hank order tmrreUitmi w,n computed Ini tween the murks 
(oluamed from the dotal column in Table j) finally assigned 
ta eavh esaniinatmn paper by la I the means of the Aiudenti' 
ratings and (hi the uistrmtor's ratings Hho was found to lie 
qih ‘The lVuiwniaii melhumt oj correlation was computed 
between the same ^ets of final marks and was bnind to he yi 
The Pm&oman correlation vodbucm wan also computed be- 
tween the total mean ratings given to each ipiculon hy the 
eleven undents and the total mean ratings given to each 
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question by the instructor (obtained from the last row in 
Tabic l). 'Hus correlation was found to be 98. 

Each of the rating sheets handed m by students was then 
examined to sec whether there was any significant relationship 
between the total ratings given by each student to ail the others' 
examination papers and the final rating given to Jus own paper 
by the other students and the instructor A Pearsonmn correla¬ 
tion coefficient of — was found between the ratings given 
by the students to others and the ratings they received from 
the others This mains that there was apparently ,1 slight, 
but highly unreliable, tendency for the students who did best 
on the examination to give stricter and less generous ratings 
to the other students 

From the foregoing data and calculations of this study, 
the following conclusions seem to be warranted; 

I. The mean rntings of the students to their own essay-type 
examination papers were very similar to the ratings given to 
their papers by the instiuctor- 
a, Neither the students nor the instructor utilized consistent 
halo or under-rating effects in their markings of the examination 
papers. 

The main advantages of having the members of a class rate 
their own essay-type examination papers seem to be these 
(a) The instructor has a check on the accuracy and fairness of 
Ids own marking system (b) The marking of the papcis itself 
becomes a stimulating learning process which may have con¬ 
siderable value, (c) There seems to be virtually no come-back 
on the part of the students or claims of unfair marking which 
are so frequent after an essay-type examination which is ex¬ 
clusively marked by the instructor. 

The mam disadvantages of the student rating of essay-type 
examinations seem to be these; (a) The actual marking, from 
the instructor's standpoint, takes somewhat longer than the 
usual kind of instructor marking, since the answers have to 
be fond aloud to the class, and calculation of menu student 
ratings must be done, (b) The reading aloud to the class of 
essay-type questions is quite impractical if there arc many 
examination questions or if there arc more than fifteen or 
twenty students in the class, 
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In view of the data concerning the student marking of essay- 
type examinations presented in this paper, and in view of the 
geneinl advantages of essay-type tests which have recently been 
pointed out by Freeman (l), Luchins (-a), Vallance (3), White 
(4), and others, it would seem that further experimentation 
with group marking is well warranted where the instructor has 
a fairly small class 
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RELATION OF CYNICISM TO CERTAIN 
STUni- NT Cl IA RACTER 1 STICS 


CllAHUil) N 1-11)1 mil MAMIN t HUT/ 
limn Stale College 

ExPAMsiONf indicative of cynicism arc frequently encoun¬ 
tered, but this chtuactciistic of personality tins been subjected 
to relatively little investigation During the development of 
a test designed to measure cynicism, 400 college sLudents fur¬ 
nished persona) data as to their sex, age, religious prefeiencc, 
political preference, marital status, educational class level, and 
father's occupation The statistical technique, analysis of vari¬ 
ance, was used to determine the relation between the cynicism 
scares and the personal characteristics of these students 
The testing instrument consisted of 200 items, each concerned 
with a situation toward which .1 subject could cxpicss cynicism, 
A detailed description will not he given here as this is available 
in other publications (r, 6) The following statements il¬ 

lustrate the types of items used 1 “1 would say tluiL perhaps 
as much as half of our lav money finds its way into the hands 
of grafters” and 11 1 believe that tit least <jo per cent of the 
girls would rather many a poor hoy whom they love than a 
rich man whom they do not love " A total weighted score for 
each subject was obtained by a simple procedure which was 
fount! on the basis of statistical study to be the optimal scoring 
plan (6) Attention has been given to the opposite or what 
might be called <f idealistic" responses but these investigations 
will not be considered m this report (1) Copies of the test 
were distributed at random intervals to 400 students emoiled 
in psychology courses at Iowa State College. Each subject was 
requested to complete a short questionnaire giving certain 
personal data. The tests were completed nml relumed at the 
convenience of the students, without signature, m the hope 
that this method would produce more candid or uninhibited 
replies, 
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The stoics of ,]«7 MiliJLUS Here classified by age and sex 
as shown m Table 1 Ihiriecn subjects did not indicate their 
age To determine the relation of cynicism to age and sex the 
data were treated by analysis of variance and adjusted for 
disproportions! 1 ty lolloping the method suggested by Snedecor 
(7) D1flcrc1n.es significant at the erne per cent level of confidence 
were found among the age gioupi (Table 2), Differences signi¬ 
ficant far beyond the one per cent level of confidence were 
revealed between sexes (T aide 2), males exhibiting more cyni¬ 
cism than females tTalile 1) However, there seems to be no 
"joint effect'* of age and sex since the F*vnlue for "interaction” 
is not significant 


TABI i: 1 
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When significant relationships between characteristics and 
scores arc found, 11 la desirable in making further analyses 
to control these characteristics in the classification, On the 
other hand, e\m though significant relationships are found, 
consideration must be given the size of each class in order to 
avoid emphnu/mg relationships based on suspiciously small 
numbers of 1 Although cynicism was found to be signi¬ 
ficantly related to age and acx ns shown in Table 2, inspection 
of Table t, giving the classification of 126 males and 261 females, 
shows that size r»(f the age groups dropped to as small as ten 
in number It considered desirable, therefore, to control 
only on ftrx for further statistical treatment 
The store* of the 400 subjects were clnssified by sex and 
(1) religious preference, (2) political preference, (3) marital 
status edm ,iurm.il level, and (5) father's occupation The 

scores of two Jcwi-di subjects, seven subjects expressing a pref- 
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crcncc for the Socialist party, thirteen subjects designating 
their marital status -is cither *'widowed" or "divorced" or 
designating no marital status, twelve "special" or "graduate" 
students, and twenty-six subjects indicating "deceased" 
or making no response for father's occupation were disregarded 
for computational purposes. The mean scenes and frequencies 
for the various groups arc shown in Tabic 3. These data were 
treated in the same manner as described in the interpretation 
of Tables t and i. 

Religious Pr (Jinnee, - Students expressing no religious pref¬ 
erence revealed the largest amount of cynicism, and Catholic 
subjects revealed the smallest amount among the three groups 
for each sex, It should be noted that the non-preferencc groups 
constitute the only comparison shown m Table j where the 

TABLE a 
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female mean cynical score exceeds that of the mate. The differ¬ 
ences among the religious preference groups were found to be 
significant at the one per cent level of confidence, 

Political Preftmicttr -The three male political groups ex¬ 
hibited more variability of cynicism than the three female 
political groups. Also, in every political preference group the 
males were more cynical than the females, and this sex differ¬ 
ence was found to be statistically highly significant 
Marital Status ,—dn every marital status group the females 
were less cynical than the males. An extremely high mean 
cynical response characterized the engaged male group A some¬ 
what lower mean score was found for the unmarried (and 
unengaged) male group. This is just the opposite of the con¬ 
dition found for the corresponding female groups. Although 
differences significant at the one per cent level of confidence 
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existed among the marital status groups, the interaction of 
sex and marital status was not significant. 

Educational Level —Some variability was found among the 
educational class levels, but the differences revealed were not 

significant. 

Father's Occupation —Although a greater difference in cyni¬ 
cism was found between female farmer and non-farmer groups 
than between the corresponding male groups, the differences 
between the occupational groups were not significant. 


TABLEj 

A/mm Store emit frequency 0/ Student Cfiaroclerutte 
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Father's Occupation 

Farmer 

46 
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83 

73 96 
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75 

94.09 
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The above findings confirm an earlier study involving the 
use of correlational and chi-square techniques (4), 

A summary of the F-values for the student characteristics 
investigated is shown in Table 4. It was concluded that after 
controlling on sex, highly significant differences regarding cyni¬ 
cism existed among the age, religious preference, political pref¬ 
erence, nnd marital status groups, but the F-values for the 
interactions of these groups with sex were not significant m 
any instance, There were no significant differences after eon* 
trolling on sex among the educational levels or between the 
two occupational groups, farmer versus non-farmer The 
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F-valuis for die iMcrutiions pf die* educational levels and the 
occupational groups with vex were not significant After con- 
trolling on sex, the highest F value was found for religious 
preference; followed in order l>y marital status, political pief- 
crcnce, age, educational level, and father's occupation 
It is of imprest to note tli.it in all classifications significant 
differences were found between die U'xcs, and the consistency 
of dm finding might well call for further comment Fust, it is 
possible that the difference m a true difference, that men really 
are more cynical than women If coriect, this would suggest an 
investigation mm die nature ol exponent cs whereby men are 
caused to take cm n more intensely cynical attitude than women, 
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A act mid p*H«ihlt? explanation would involve a cnticism of the 
test Usdf, that the difference is a function of the items em¬ 
ployed However, an inspection of the situational statements 
does not seem to indicate any grcai likelihood of "masculine 
loading 11 which might force a greater male cynical scoie. More¬ 
over, in the development of a short form of the test, it was 
found that the 65 most differentiating items tended to be those 
of 11 somewhat philosophical nature and therefore not par¬ 
ticularly "sex-linked 11 as might be expected fiom more highly 
specific situations (j) It could be held ns a third explanation 
that women are more inhibited and not so likely to give a 
strong cynical response, According to this hypothesis women 
could equal or even exceed men in cynicism but that the test 
would fail to elicit an expression of its strength. In criticism, 
it might be said time such n sex difference in test attitude is 
not now accepted and, were it true, would invalidate much of 
our present-day testing, It may also be pointed out that the 
practical effect of suppressing cynicism might still be the 
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same as .1 true difference In ocher words, suppression of cyni¬ 
cism may have the same cficct upon behavior as lack of cyni¬ 
cism. 

Summmy 

The scores of 4 00 college students on 11 test of cynicism were 
treated hv an.ilvsis of variance to determine the significance 
of the iclatmn rd cynicism to certain peiaonnl chamctcmtics. 
In all classifications, males were found to be significantly more 
cynical than females Variations in age groups weie found, 
older students, in peneial, being more cynical than younger 
students I hose c\piessing no idigious pieferencc weie most 
cynical, C atbnlit subjects the least, with Protestants occupying 
nn mtcimediate ]>ositioii Classified politically {Republican, 
Democrat, nr no preference) males not only showed gicater 
variability but in all three groups were muc h more cynical than 
females In every marital sratus group (mnmed, uumairied, 
engaged) males were distinctly 11101c cynical than females with 
the highest mean value Inr the engaged group. Although dif¬ 
ferences weie found m mean 1 ynir.il semes when classified by 
educational level ll'irdim.in, Soplmnimc, Junior, Senioi), these 
variations wur not significant bather's occupation classified 
as faimct versus mm funnel failed 10 reveal any significant 
differences 111 terms of degree of cynicism 
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ROB]- in CM us 

Unreerwtv nf Miwnrii 

In view' of our protont knowledge, it is logical to suggest that 
teaching ability is so complex that it cannot be investigated 
efficiently as a unit, However, there are many aspects of teach¬ 
ing ability which can be isolated and studied independently, 
One of these is the aspect of teacher-pupil relations Investiga¬ 
tions to date indicate that teacher-pupil relations may be pre¬ 
dicted from knowledge of teacher-pupil attitudes; however, 
little is known about the effect of training and experience on 
these attitudes Therefore, the major problem of this study was 
to investigate the change that occurs in teacher -pupil attitudes 
during teachcr-trnming and early teaching experience, 

To measure tcadicr-pupil attitudes the Teacha Attitude In¬ 
ventory, n slight extension of the one constructed by Leeds (3, 
9, ig), was used Leeds found that his Inventory would predict 
teacher-pupil relations reasonably well (1 ^o.fta between In¬ 
ventory scores and a multiple criterion of teacher-pupil rela¬ 
tions) Scares cm Leeds' Inventory and the one used here 
correlated 095, therefore the inventories can he considered 
approximately equal m validity and sufficiently valid to justify 
further investigation {4), 

Briefly, the rationale of the ‘Ti mter Attitude Inventory is as 
follows The inter-personal relationships between teacher and 
pupils are an integral part of the complex of teaching which 
bears directly cm the mental hygiene of the classroom It is 
these inter-personal relationships that wc arc trying to predict 
from a knowledge of the teacher's attitudes toward the status 
of children and classroom situations involving discipline and 
other social factors. In fact, these attitudes might also be termed 

J This paper \n ft summary of ft Ph 0 ihw of the name tide on file >n the Uni¬ 
versity of Minnesota Uirery (1) Ons writer expresses sincere itpprceintiqn 10 Ur 
Walter W C00V who {tervetl ns mujor ttdviwr to the fliud) 
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a part of u teacher's philosophy of education. Leeds’ approach 
to the prediction problem was an empirical one The present 
study was designed to determine m a general way the stability 
of the attitudes being measured 
Many personality inventories similar to the Teacher Attitude 
Inventory are susceptible to attempts to fake good or bad {1,5, 
6,7,8,11,12,1^,14). If the Teacher Attitude Inventory were sus¬ 
ceptible to faking to a high degree, any change in TAI-Scores 1 
during training and experience might be "contaminated" by 
intentional faking Before the major problem was attacked, the 
fakability of the Teacher Attitude himitoiy was investigated so 
that proper interpretation of any change in TAI-scores during 
training and experience could be made. 

The Procedure 

For the investigation of the faknhihty of the Inventory and 
the change in teacher-pupil attitudes during training and ex¬ 
perience, six testing sequences were set up Each sequence was 
composed of two testings of the same group of subjects. These 
sequences were. 

Sequence 1 - -the first faking sequence, consisting of a random 
sample of First-Quarter juniors in the College of Education*. 
Fall 1947, who were first tested by standard instructions and 
again four to tix weeks later by instructions to fake good * 
Sequence 2- -the second faking sequence, consisting of a group 
of First-Quarter juniors m the College of Education, Spring 
1948, who were tested first by instructions to fake good, and 
again, ten days later hy standard instructions. 

Sequence j— the control sequence, consisting of n group of 
First-Quarter juniors in the College of Education, Winter 
1948, who were tested and retested by standard instructions 
at a week to ten day interval. 

Jo-Ji sequence—t\\t first change-in-nttitilde $equence ? con¬ 
sisting of a random sample of First-Quarter juniors m the 
College of Education, Fall IQ 47 | who were first tested 
(standard instructions) at the beginning of the school year 
and again six months Inter. 

So^Sl sequence -the second changc-m-nttitude sequence, con¬ 
sisting of n group of First-Quarter seniors in the College of 

1 Scare* cm lire further Altitude Inventory are referred to u TAl-scoroa- 
1 Thu College of Education referred to here in the College of Education, University 
qfMinrunou, . L1 

‘‘[he juUjwu were laitrucied to mrempi to make as hah a score posaibiej 
ihai is, answer the items as they thought a good teacher would 
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filmaiirn, Rdf 1747, vdin ucre lint tested (standard instruc¬ 
tion n I if rhr learning of (he school yc.ir anti again six months 
later 

Yo- 7 / 'cjucwt die third change jn»actieiirlc sequence, con- 
mmiiih of a group ul beginning teachers who graduated from 
the College of 1 'linearjnn, Spring fir Summer 1947, who were 
fir it tctunl (standard instructions) .if they graduated and 
again after they had lw*cn leaching f«r six months 

The data for the experimental sequences wcic compaied with 
the data fin* the control sequence to determine what change in 
1 AI scores had oumred as :i result of (1) attempts to fake 
good, (1) teacher-training, and (j) teaching experience Also, 
an anidvxiH of the responses to individual items was made to 
determine which attitudes (items) were affected hy darning 
and w hich by cxjtericncc 

The chance half and tcs^reiext leliability of the mventoiy 
was determined Also, the relation ol TAl-scurcs to a me.isuie 
of mrdbgcncc was determined 

bindings 

Major Jiminigs 1 The susceptibility of the Inventory to 
attempts to fake good was investigated by comparing the data 
from sequence f and 2 with the data of the umliol sequence 
It was found that when the subjects answered the Invemoiy 
first according to standard instructions and second actoubng 
to mstruuions to fake good, an insignificant increase 111 TAI- 
suires occurred (P> 05), ftSee’l aides 1 and 2.) However, when 
the subjects answered the Inventory first according to msti no¬ 
tions to fake good and second by standard instruc cions, a mean 
decrease m TAI scores was observed that approached signifi¬ 
cance (P r 02) These data suguested that the lnvmoiy may 
he susceptible to faking to a limited extent so an attempt was 
made to construct a scale which would identify faking After 
several trials* a scale was developed which showed promise of 
being useful in detecting faking hut was not considered useful 
in Hs present form. .Since it was found that rlie attempts to 
fake good produced only limited changes in TAl-seoreannd since 
there would lie little motivation for the subjects in the change- 
imultitude sequences to fake any more than was characteristic 
of the individuals, no allowances for faking were made in the 
analysis of the change in attitude data 



II Aim K-PUI'll AI'UII’DIS 


721 


2 When the cl.ua of the Jo J1 sequence (jumois) were com¬ 
pared with the d ua ol the comml sequence, a significant m- 
crease in TAI-scoies was observed (P< 01) (See r I\ibies 3 unci 
4 ) 'I his mo ease may lie interpreted as 11 shift m the direction 
of more desirable roaihcr-pupil attitudes Ihesuniahly the most 
pertinent cxpcncnicx of the subjects in the Jo-Jl sequence wcic 
general courses m education which were the subjects* fust ex¬ 
perience with profession, il 1 nurse wruk "Hie con elation cocffi- 
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* IkHren* t liJirr loi or Sn^mhunrt between mean* when the vimnnces nre 
unp|uj| 

ciciit between the 1 \bscores of the first and second testing in 
this sequence was **71 

j The coin parrson of the data for Lite So-SI sequence (seniors) 
with the data for the control sequence mdicnted that no signif¬ 
icant change m T U stores had occured (l* >, 05 ). PieBumably 
the most pertinent experiences of the subjects in this sequence 
were student teaching mul courses in teaching methods Ihe 
correlation coefficient between TAhacorea of the first and sec¬ 
ond testing of the Ho M sequence was 0 , 74 , 
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^ 7 he i<mip.iriwn of the data for the Tn-'l I sequence (teach- 
ers) with rhe data for the control sequence indicated that a 
Amnifiranl decrease in TU scores had occurred (P< oi), which 
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* fifth nena Fisher Tc«a of bcmwtirn mean* when ih< varrtincw are 

unequal 

may be interpreted as a shift in the direction of less desirable 
teacher-pupil attitudes Fulltime teaching was the experience 
of the subjects in this sequence which was presumably most 
pertinent to TAI-scorei, The TA 1 -scores on the two testings 
m the To*Tl sequence correlated 0 66 
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j Ail analysis ol the effect nl tr,lining and early excellence 
on each of the 239 teacher-pupil attitudes (items) m the Inven¬ 
tory revealed that a majority of the attitudes weie not affected 
significantly hy training or experience The first six months of 
professional training produced significant changes in the de- 
Arabic direction in 20 per tent of the attitudes (items), while 
the first six months of experience produced significant changes 
in the undesirable direction m ti per cent of the attitudes 
(items) There were only four attitudes (items) which were 
affected significantly both by the first six months of training 
and the first six months of experience (see Table 5) 

TABLE s 

fhi Sutnfor p/ item it’hith Shv&tJ Sipnficam Chang 1 ( 0 / the j per ton level) m (fie 
Per (,i*tl p f the t f nrtfi'4i (imupl .Ifuayrine the Innu Correctly Dunn? Ihe 
Ten Hen if, 7© //, Sff Si, nna To-Tf Sttjuenttjf 
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* Per tent of wul n^imtwf of iirmis in die Inventory 

1 AIkriI luir of the itrrm which nhtirwcd Mgnific«nt response change at the 5 per 
«ni level iil«J ithnwrd ftighificaiit rtsponw, change at ihe 1 pet cent level Sixty per 
cent of all Hem* nh<oml no ^niUeani mptMm change at ihe $ per cent level in nny 
of the font w*pi«»e*> anil 79 iwr erm of all the ueim showed no significant raponsc 
change ai ihr t per tvm Uvrl in any of ihe four sequences Also, there was no ten¬ 
dency for «lw ucrrn wfmh -vhwratl dgnifitam ra*pon*c change to be loented in nny 
particular of ihe inventory 

fi, The group of graduating seniors, Spring and Summer 1947, 
from which the To group was drawn, was divided into three 
major curricular grouping* 

1 Early childhood education majors (nursery, kindergarten, 
primary, elementary), 

2 Academic field majors (English and speech, foreign lan¬ 
guage, mathematics, science, social studies), 

j Special field majors (art, home economics, industrial, mu¬ 
sic, physical education) 

After having first determined that there was no sex difference 
m TAT mom* m the suh-gioups, it was found that there were 
sigmfit am differences rn TAI-acores among the three sub-groups 
with the early childhood education majors scoring highest and 
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I he spend In Id majors signup Imust (mc* Tables 6 & 7 ) *\ n 
mspr* fi<»n rrt Ml st r»ro^ * *1 llir Jn L'rmip (Virsf^u.irtei junjors) 
by ihr Minr * urin uhirsuli dit moms rcMMlul differences among 
flic sub groups 4 1 1 .lltnuf rhe Mine nidir nj iimi'mlmlc ,is those 
uhsmid for Mir pMdu.Uun: ‘•cniurs Kvc Table fi) 
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A////W findings, 1 TAl-uorcs nrui scores on n measure of 
intelligence IMiffo luirm A) correlated o.ofl 1 he 

sample was the group of begrnnjiiK junior* 

1 The cliancc-lialf reliability coefficient for the whole In¬ 
ventory was found to lie o fffi Imth liy Product-Moment and 
tlie appropriate maximum likelihood estimate formula. 

j The tc&i-rctcu relinhilicy of the ftivcmtny \wts found to 
be .84 by the product-moment formula and 0 Sto by the appro¬ 
priate maximum likelihood estimate formula, 

4 A significant increase in TA 1 -scorcs (P<.ot) was observed 
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i„ the control sequence, tcsi retest .ir a week ho ten-d,iy inter¬ 
val 'Mils phenomenon is not iiimsu.il for instruments similar 
to the one used here 

^ Time was no significant difference in FAI-scorcs of those 
graduating seniors who umjhr during the school year following 
graduation find those who did not (see Table ft) 

( V/Wjr iuuons and fnterpr tint ton 

Two major comInsioos may lie drawn from the findings of 
tins investigation 'I lie first uuuhimnn is that the attitudes 
weasin ed hy the Tcmhtt .ifltinde hiveitfary me of sufficient sta- 
fahtv to uanant fuithn m^rstigatton tn to then efficiency in pn- 
dteitn if ton la -pupa' irfaitnm find m pi edi timing selection of 
toicha Several of the findings support this conclusion The 
changes in 1*\I stores that occurred during the time spans 
studied, even though significant in two ol ihe three sequences, 
were not ol meat magnitude Vu r then, the increase m 1 V\T- 
Scores that tk mired m the junior sequence was practically 
negated hy the dwteiM'm the te.whcr sequencesu that by the 
time a leather his taught m\ months lus attitudes toward 
pupils as iin asurrd hv the Tauho Jtfstudc finentotv are about 
the saint as when Ik began pMVssumal (raining as a junior 
Also the correlation uMhucnrs between first ami second test¬ 
ing in the sequence tend to be just significantly less than die 
test retest reliability cortln lens, unlit.King that die individuals 
lend to hold then rrqcettue rank* during each of the lime 
spans studied J he iVt that the Inventory was found tn be 
only slightly sum rpuhle ui attempts to fake good gives added 
confidence to the conclusion that the utmrdes being measured 
arc rather stable l-m dly the fat l that 79 per cent of the indi¬ 
vidual nil 1 index Uiemsi were not nflecited significantly (at the 
1 |wrcem lrvel| hv training, experience, or test-rcLcU at a week 
interval, permits this mnclirtirtn to he drawn with a high degree 
of confidence 

The second major conclusion to be drawn from this investi¬ 
gation i* that there are jugtn/nnnf differences in teacher pupil 
attitudes among subject* eUnsifted hy their major eniTtculnw and 
that these different n arc present tn almat the same magnitude at 
the beginning of pi oftsnmal training as at the end of tl, with the 
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early childhood educiilion mijor ranking highest as a group 
and the Jtpcci.il field majors tanking lowest: as a group This is 
particular!) sigmlu uu ill uew of the fact that the sconng of 
the items was determined on groups nf teatheis distributed 
throughout all twelve grades, *?o that responses nf elementaly 
grade teaehm vs ere iuu unduly weighted m the development 
of the scoring key 

It would Appear that the attitudes mousuicd by the Inven¬ 
tory arc rather well formed by (he time flic subject enters pre- 
profcssinnal training and are. influenced to only a minor extent 
by training and the first half ycui of teaching However, there 
is a small group of attitudes that are affected significantly by 
training and another group, still smaller, that is significantly 
affected by experience, \Is«, it would appeal that tcnchei-pupil 
attitudes are operative in the subject's selection of the field of 
education in which he wishes to specialize It is conceivable 
that these attitude* have dements in common with vocational 
jiueiest and that a measure of them would he useful m counsel* 
,ng students about vocational choices 
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A STUDY OF CLIENT HESmNMHILlTY- COUNSELOR 
TKUNIfil'E OR IN (REVIEW OUTCOME? 

CIMRIXS ► I MON 

Ohio hiaw t 5 niYf/friiy 

Tiif recent advances m interview methodology illuatute the 
fmitfulneKN of an empirical iipprniuh to the complex relation- 
ships within the interview 7, K, io) However, it is obvious 
that manv pertinent problems remain to be investigated. One 
important urea of interest is the relationship between counselor 
technique and interview outcome Related to this area of in¬ 
terest is the current controversy between “dilative" and "non- 
directive" therapy and the tpicsiions which have been 1 fused 
concerning rhe role of diem responsibility dm mg the interview 

An important goal or outcome of counseling is the assump¬ 
tion by the diem of sc)f«jcsponsilji)itv (2, 3) Consequently, 
the degree to which a client takes self-responsibility has been 
used as a criterion of interview effectiveness However, forcing 
a client to take responsibility for the dueetmn of the interview 
is a commonly used counselor technique (5, 9). The purpose of 
this study rs to tlarifv this dual role of responsibility-taking 
during the interview. 'We shall seek to answer die following 
questions. (0 Is the assumption of responsibility by a client 
during an interview' an important outcome or critenon of 
interview effectiveness? (2) Does the throwing of responsibility 
upon sl client repicsent a useful counseling technique? And {3) 
may the responsibility behavior of a client during an actual 
interview be differentia ted into that responsibility which is a 
ci (tenon anti that responsibility which is a manifestation ol 
counseling technique? 

Dmnpfion of tht Data 

The data used 111 this study weic obtained from transcripts 
of yS interviews which were recorded in 11 counseling practicum 
offered for advanced students at The Ohm State University. 
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Of these interviews [2 weie conducted by 7 experienced coun- 
selma while the remainder were held by 16 uninselois-m-trnin- 
ing For a period of ten weeks these counselors met weekly 
with students m a how-to-study course Although the problems 
of the cminselces generally revolved around study difficulties, 
other problems of individual adjustment weie frequently en¬ 
countered Characteristically, the students in this course arc 
normal individuals who desire to become more effective in 
their college relationships (4) 

Definition of I'm tables 

The umt of analysis used in tins study consists of the coun¬ 
selor and countclcc remarks which arc related to a particular 
problem as it is discussed in the interview and is called the 
"discussion umi ** Previous interview analysis has shown that 
chcie is a high degree of reliability between judges m differ¬ 
entiating the end of one topic of conversation and the beginning 
of a new topic (7) This sort of unit analysis produced 421 
discussion units in the 78 interviews However, 68 of these 
were not used because they were loo short for reliable classifi¬ 
cation or dealt with such special topics as social visiting, making 
suitable ariangcmcnts, etc The units used weie concerned 
with four general topics, namely, study skills, scholastic ques¬ 
tions, vocational problems and personality pioblems Because 
the last three of these four topics all dealt with making decisions 
m different fuliK an initial analysis was made to determine 
if thc'te topic** showed similar distributions The results indi¬ 
cated the feasibility of combining tbe topics of vocational prob¬ 
lems, scholastic questions, and personality problems under a 
general classification of decision-making units Thus, in the 
353 interview units there were 148 study-skill and 205 decision¬ 
making units, 

The uuieepi of responsibility-taking refers to the degree to 
which the client or counselor is responsible for the diiection of 
the interview. Within an interview the division of responsibility 
may rest, ut one extreme, with the counscloi, at the other 
extreme with rhe client, or the responsibility may be shared to 
a large degree between the counselor and client This division 
of rcspmiHihility between counselor and client during each dis- 
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cussion unit wmx rated tin a live-point stale t rej>rcsenting the 
highest degree of tnunnclnr rc^p*insiljilit\ and 5 rcpiescnting 
the highest degree id tnunielcc icspniisdulm 

Suite a counselor usually demilunes responsibility-assigning 
through rhe t}|>e rtl remarks lie nukes, it was also important 
to study these tvpes of counselor remarks The most important 
characteristic of counselor icninrhs which detci mines itspnn- 
sibihty is their degree of''leading, 1 ' Uy leading is meant the 
degree to which catli counselor's returnk seems to go beyond 
the flunking of the client ns expressed in his pieieding state¬ 
ment '1 he concept of leading dots not necessarily mean pulling 
the client along, it may represent a teani-Iike arrangement - 
as between a pass thrower and receiver m football in which 
the counselor speaks in terms of uliar he thinks the client is 
ready for 

Thus, the two important characteristics of a particular coun¬ 
selor technique arc the amount of lead and the degree to which 
the counselor is responsible for the direction of the interview, 
For example, if a counselor clarifies a diem's previous Jciiinrk 
for the client, an often used tetlmiquc, the counselor is not 
introducing any new idea into the interview and, heme, in 
terms of relative distance, he lias not moved ahead of the 
client's thinking hut at the same tune he has thrown the re¬ 
sponsibility for the next point to be discussed onto the client 
(It is to be noted, however, that in using clarification the 
counselor is leading in the sense that lie may select one idea 
out of many to respond to ) Suppose, however, that a counselor 
urges a client to undertake a certain course of action, In this 
instance the counselor may have introduced n new idea to the 
client and if the client does not possess enough insight to utulei- 
atnnd the relation ol this idea to Ins problem and is not ready 
for the idea, the counselor is quite some distance ahead of the 
client’s thinking Here the counselor is assuming, at the same 
time, responsibility for the point brought up in the interview 

The “primary counselor technique" for n discussion unit was 
determined by tabulating each counselor response within a 
discussion unit according to 1 of 10 categories of leading tech¬ 
niques Within any interview unit the modal technique was 
called the primary counselor technique. The result of this tabu- 



M I 11V 0* i ] IFNI Rl M'ONMMI m 


7M 


l.ttion indicated th.it vlaiifiL.moji, tont.itj\ i analysis, rnicr 
pretation, .uul urging were the most frequently used irumsdoi 
techniques Only those urnts employing one ol these techniques 
were used in the present studs Detailed definitions of these 
techniques are given in a teient article hv Carnes and Ho bin 
son (i) 

Evidence of jiitemrw outcomes consisted of ratings rif fa) 
growth in LOimsilre insight during the discussion unit fbf the 
working relationship between counselor and client, and (r) the 
division of responsibility for the direction of the interview 
Five-pomi rating scales were used (or each of these variables, 
Sherman found that their reliability was suOieiemly high in 
justib their use with this tv pc of problem f*;) 

Rrsulh 

1 What relationship does diWMtm of res|vrjhsiluhfi taking 
have to interview outcome? One means of answering this ques 
tinn h to correlate responsibility ratings with other ratings of 
known interview outcomes eg . insight and woikmg relation 
ship Product moment von elation lodlkirnts weie obtained 
separately for l.pl studs vhdl units uul x< ^ decision Jinking 
units These lorrclaiions were of t)ie order of 40 between 
rcs|wmsihi)itv and msiglu lor the study skill units irtd 
between rrs]Hmsibibiv ami insight for the decision Jnaktng 
units; 47 Irctween rcqmiiMlohn and working rclahondup for 
the studs skill units and ^ Iwtwcen resptmsibditv and working 
relationship with the decision making mills The jumibiluv n( 
chance factors contributing to the^r c«»rreLiuom m.*v hr re¬ 
jected at the one per tciu level *d umhdriur While a torrela 
timial relationship has Iccni shown, no assumption may W 
made from thece coefficients as to the Imeanti of the relation 
ship nor to the factor or factors involved 

Of lurcher interest is a qualitative anahsn the average 
uvuglit and working relationship ratings for each rrsponsihthu 
rating fTable 1*, It will be recalled that a re^portiubahis rating 
of t signifies ihe highest degree tit tpuntplur acceptance of re 
itpumibilm for the directum of «lie interview, while a rating 
of ^ signifies the highest degree of client acceptance «f reqwn* 
nihility for the direction of the interview In tin? ease of inflight 
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and working relationship a rating of i means no insight or a 
resistive client, while a ralmg of 5 means a high degree of c |i ent 
self insight nr an excellent working r'cl.itinnsJtiji between client 
and umnH'lnr 

The following gcncrali/atum 1 ! are drawn Jrom Table 1 (a) 
Cmmsclce insight is not hkvlv to develop when the counselor 
assumes, through choice or necessity the enure respnnsduhty 
for the direction ul 1 he inters lew (h) l lie middle ratings of 
res]Hinsihiht> tend to he .tuompjjucd hv tin gicutest gains m 
staled insights fU \ harmonious relationship between thcclicnt 
and counselor is nnm bkeh to lye obtained il the responsibility 
for the direction of the interview 1% shared between the coun¬ 
selor and client lhe averages reported m Table 1 art not 
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amenable to refined interpretation, hut it needs to be pointed 
emc that the relationship between rgsjmnsihilitj, and the other 
outcomes of insight and working relationship is riot strictly a 
linear one m the ncnsc that high values in one outcome are 
nctumpamed by correspondingly high values in the other In 
other words, the middle range of ihese rated outcomes is gen¬ 
erally most dfective in promoting positive relationships within 
the interview. 

2 Is the thawing of responsibility upon ,t client a useful 
counsel mg technique? Having found that rcsponsihiln v-inkmg 
is positively related to insight and working iclatmnsliip, we may 
now investigate the influence of counselor technique upon the 
dcgiee of responsibility usuumed by the client In order to do 
thus the 353 interview units were divided into groups first, on 
this basis of topic al conversation, \ e , either study skill or 
decifliommnking, and second, on the basis of primary counselor 
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technique used, p e , darifie.Umn, tentative analysis, interpret 
t.ition, and urging With the data thus d,unified the mean rr 
snonsilnhty was computed for all of the* mills tli.ir.u icri/cd h\ 
the use of a given primal v tounsdnr tcdimrpir 'these mean 
values are shown in Table * Suhscqucmlv, (he sigtufKamc of 
the differentcs between the mean rcspiuvnluliiv values was 
determined by an an.thsts of v anamc 1 lie refills arr retried 
in the follow mg (wn paragraphs 
Wild) sliulv shills \s. 1 ** the lojiie id umvrrsalinri fhe van sme 
\vithin (lie primary counselor k dmi<|urs was j if*, winds u 
sigiiiftefiiii >it lilt ^ per tent Irvrl of lonhdriuc Ilownrr, l+r 
tween the individual icJinitpics some fmihr r diltVr* lues were 
found The variant? dilfeumv hrfwtrn the mniin n! dudo # 
lion .iml remain * ana)\sis < nuM It w r iju un J In dm mi done 
Tills was also due lot tile dUlerentc Inlvve^u ihudu,Ui»m and 

Midi a 

Jf-f \If-iW /{lift f r If** ‘hi s^J! r <, §*>{!> - 

/Sri ft / lilt rf / I s * f o'Si®' r i vs? ti t h *ri jf j I {4 

l li,l ti In, 1 . 

I 1 n f s il rt V 11- a t l it I an I it it 

S( nU >101 t MHs ^ 3 d 3 1 <I 3 11 fl 

IlfDM-jfi M-U j e t‘«so ; /„* ? >3 j it v ', 

inieiprel.ition 1 h^ ddkroiie Icriwrrti damnation and n»g 
ing was between kmtdvr .titiksis and utgwg, ai 

was k f»j, Imrh *d wluji arc Mgmluam *( ivwmd she »ine 
per cent level of tonlulcme the vanaine heswrm tin- means 
of tentative andvas usd interpretation was Wiwccro 55 s 

iciprelaimn .usd urging it was (1 ?j, Snath rd winds .are agouti 
rant at ihc ? per cent level of confident f ) he foregoing values 
indicate that 1 lit- degree nl rrspurunlfilitv iMunrd In rhr muii 
vice is a flee led In counselor trdnmpic and p trinufarK. ihu 
the uh* of urging as a uumsdoi its hnnpJf is bhrh to krrp die 
tonnseler from 1 thing respmtsdnlitv 

Ulicn tie* mult Utah mg i»un wrrr the d»pK mI v^iivef 
the variance within all ol the prunatv tomurfuj d-nhaiupcrs 
was 1 ji d>, vs huh issignsh*. nu ai Iveymiil rhr one )*c4 tens level 
of loniulemc I lie varoaute liftwno ilir uiram ol «.J B?do d«on 
anil tenrame analwis louhi hair otoured In <.han*e alone 
and, hence, it* not signiJitani "I he v inaiue hetwecii die means 
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of clarification and interpicUtion was 15 05, between the means 
of clarification and urging it was 39 05, between the means of 
tentative analysis and interpretation it was 1239, between 
tentative analysis and urging it was 35,95, and between inter¬ 
pretation and urging it w as 14 Ho All of these values die signif¬ 
icant; at beyond the one per cent level of confidence Hence, 
again, the evidence points to the fact that the primal y coun¬ 
selor technique affects the degice of responsibility assumed by 
the client nnd that every technique is superior to urging in 
obtaining client responsibility-taking, 

Although the analysis above provides, in some instances, 
clear-cut distinctions between the counselor techniques it does 
not warrant too broad generalisations about the effectiveness 
of these primary counselor techniques That is, mean values 
me given in Tabic 2; each primary technique had units with 
ratings of the highest effectiveness in both working relationship 
and insight, Each technique may be useful to the well-trained 
counselor, primarily their usefulness is n function of the situa¬ 
tion in which the counselor is forced to operate It is cntiiely 
conceivable that with certain types of clients and problems 
a counselor could use urging with cflective results. 

3 Is it possible to diflerencuue the responsibility behavior 
of a client during an actual interview into that which is client 
initiated, i e., a criterion, and that which is simply a manifesta¬ 
tion of counseling technique? In the first patt of this paper it 
was shown that responsibility-taking was correlated wiih in¬ 
sight and working relationship. However, our analysis of vari¬ 
ance has shown chat technique of leading is highly related to 
the responsibility assumed by the client. While the first result 
might be used as an aigument that responsibility taking is .in 
outcome, the second result shows that lesponsibihty-Uking is 
markedly affected by counselor technique It is necessary to 
know whether responsibility-taking is an outcome when the 
effect of counselor technique is controlled This control was 
achieved in two ways. 

First, responsibility was correlated with insight and working 
relationship separately for each set of units having a common 
counselor technique and the same subject matter. These cor¬ 
relations are shown in Table 3 It will be seen that when the 
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influence of counselor technique is licit! const,ml in this manner 
jesponsibility-taking js still positively correlated in each case 
with insight and woiking-ielatiunship 
Second, some genci.il means are needed In measure rrspon 
sibiUty»t.ikmg as an outcome without the ncicssiit nf present 
ing it in so many sub-climinns This iruv he done through ih<* 
use of “denved scores" ‘Ihcsc derived stores were lompnlcd 
by subtiacting the mean responsibility rating lor all ol lim 
units of a given topic and counselor rcthmtpic from r.uh ota 
t.uiied rating of responsibility in such units As an example, 
suppose that our mean icspnnsihiliti vdiic for decision linking 
units in which interpretation is the prmmr) counsrlnr icdinit|ii(” 

i aim i i 

Conetalian oj RoNmu&ihtj tr ltb U rthnf Nfhiionihp mlh fti t ,n 

( hardttnufJ by I «nr /Vjr»rJii ( u uHtf',r 

Iruij 10 Jni 

MailSrAltPtt Ar»l| t ! laji-’j til nip 

SwtlyAkill Umis 
u Rcii^iiMhiliiy Correlated 

with m*inht U« t? u ,* 

b Heypormbibiv voriHaifil 

wall working rrlHhonUnp it n ^ , a 

Dcminn Milana IJjiio 
n KcuponMlmny curMired 

wnJi iH'.ihIk t * f» ti i \1 t 3 

b Uc«|iontjbjtf'Y urrrrhml 

witli working rf),ninnnhip it r< j* 

was 2,40 In computing the derived store,, anv midi munclmi 
fictl with a idsjumsilnlitv rating uf j obtained a denied %f«uf 
of plus 6 o, with n rating of a derived m ore of nniiti* ^ th 
These derived scores function to control the \ unable mdm-mc 
of counselor technique which is hem to mllucm r roponaibshn 
taking Tlut is, it is assumed dial clients with junimr drmod 
semes wanted to take more rcsjcumihiJin than was dM^S *»t 
forced upon them liv the counselor technique, wltilr ifartr 
negative scores dtd not want 1 to ti^unir 
Using derived suircs the following coeifinems nf o 4 rr|jr,f*jt 
were obtained. between rrs|nm<iiliilny and working rdat*mnh«p 
with study skill units, 4 / 1 , between rctjtoiuihility and oviuglii 
with arudy-ttkill units, ,jy, between mpmsiUiluv and working 
relationship with decision making units, het^cen rt«|«nn*a> 
bihty and insight with densirm-nuking uniK ft* Then? w- 
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relations represent the relationship of responsibility-taking to 
other interview outcomes when the cflect of counselor tech¬ 
nique has been minimized, nnil they substantiate our foimer 
conclusion that responsibility-taking is an important interview 
outcome or criterion 

It is believed that the concept of derived score is an impor¬ 
tant contribution to future research, If, as has been suggested 
elsewhere, responsibility-taking is a major dimension of the 
interview process, a mean* is now available to further clarify the 
complexity of this dimension Ceitamly, in planning counselor 
turning programs, knowledge is needed about the relative effec¬ 
tiveness of counselor techniques in producing desirable inter¬ 
view outcomes The derived score will function in such re¬ 
search to control the effect of counselor technique while the 
variable ichuionships among the interview outcomes are in¬ 
vestigated 

Two cautions need to be exercised in the interpietation of 
the analysis in this study. Responsibility, insight, and woiking 
relationship are symptoms of goals in counseling, e.g,, increased 
happiness, effectiveness, etc , and me not simply to he sought 
in themselves. Because delayed goals cannot be measured in 
the moment-to-moment work of an interview these immediate 
criteria are useful tools in judging counseling progress during 
an interview Further, responsibility-taking might be an in¬ 
dependent outcome of counseling, out aigumenc lias been proof 
by similarity only 

Summary and Conclusions 

An important goal of counseling is to aid the client to become 
responsible for bis behavior However, forcing a client to take 
responsibility is a commonly used counselor technique. This 
dual role of responsibility-taking was investigated Specifically, 
an analysis was made of the relationship between the degree 
of client responsibility within the interview and the variables 
of (1) interview outcome, c.g, growth in counselee insight and 
working relationship and, (a) counselor technique 

The data used were taken from 78 recorded interviews which 
occurred in conjunction with a how-to-study course offered at 
The Ohio State Univcisity. The 78 interviews were broken 
down into discussion-topic units 
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Responsibility as an mtei view outcome, and as a result of 
counselor technique, was analyzed in three ways' (i) Ratings 
of responsibility weie correlated with other mtei view outcomes, 
(2) The effect of technique upon lesponsibility was determined 
by an analysis of variance of the mean differences of each pri¬ 
mal y counselor technique, (3) A differentiation was made be¬ 
tween the effect of technique and outcome by the use of derived 
scores. 

It 19 concluded that. (1) The amount of lesponsibihty as¬ 
sumed by A counseled can be affected by the counselor technique 
used and (1) when the effect of counselor technique is con¬ 
trolled, responsibility-taking is related to other interview out¬ 
comes and, as such, it may be used as a criterion of interview 
effectiveness. 
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